BI & Warehousing
Apart from system-wide improvements to Essbase integration across OBIEE 126.96.36.199, the other Essbase-related improvement that came with this latest release was the (re-)introduction of SmartView as the replacement for Oracle BI Add-in for Microsoft Office (“BI Office”), OBIEE’s previous MS Office solution. As a reminder, BI Office appeared with OBIEE 10.1.3.3 back in 2007/8 and supported integration with Microsoft Excel and Powerpoint, allowing you to download analysis views from the BI Catalog and then view them within Excel and Powerpoint.
What you couldn’t do with BI Office though was use it to create new analyses, or upload what you’d created back to the BI Catalog. There was also no integration with Microsoft Word or Outlook, which meant it was a feature meant more for viewing and copying analyses into Excel and Powerpoint rather than as a “first class” report authoring environment.
Then when OBIEE 10.1.3.4 was released, a number of EPM Suite products were integrated with OBIEE, including Workspace (now resurrected with OBIEE 188.8.131.52), SmartSpace (where did that go?) and SmartView, the long-term replacement for Essbase’s somewhat minimalist Excel Add-in. This was all good stuff except that, in terms of OBIEE support, this version of SmartView was essentially unusable, rendering OBIEE data in an Essbase-like way that made little sense for an OBIEE user.
“The UI takes a bit of getting used to” was my comment at the time, which in retrospect was a bit of an understatement and this version of SmartView had little to no take-up within the OBIEE world, with BI Office carrying on until now as the only viable MS Office integration approach. Now though, the new improved version of SmartView is with us, so how well does it work with OBIEE data?
SmartView can be download from the BI Presentation Services homepage, but note that this is the 32-bit version and you’ll need to go to My Oracle Support for the 64-bit version, available using patch ID 16238382 (at the time of writing, for SmartView version 184.108.40.206.310). Once its installed, select SmartView > Options > Advanced and enter your general EPM Suite Smartview Provider Services URL into the Shared Connections URL setting (in the format http://[machine_name:port}/workspace/SmartViewProviders), like this:
This setting only covers SmartView connecting to Essbase and Financial Reporting, so to connect to OBIEE's Presentation Services Catalog you'll need to create what's called a Private Connection (or define a shared connection for OBIEE within an XML file, as detailed in the SmartView 220.127.116.11.310 docs), by pressing the Panel button in the menu ribbon, selecting Private Connections from the Smart View menu, then clicking on the Create new connection button.
Then, when prompted for the SmartView connection type, select Oracle BI EE, then type in the OBIEE SmartView URL in the format http://[machine_name:port]/analytics/jbips, and press Finish to complete this part of the process.
Then, when prompted enter the username and password for your OBIEE system, and then save the connection as a private connection to your workstation.
Now you should be able to browse the BI Catalog and select a SmartView report, for example, to view within Excel.
Or you can select any view from a regular analysis, and add that to Excel just as you did with BI Office.
More importantly though, the View Designer feature allows you to create a new report from scratch, selecting from any subject area in the BI Catalog and creating a report from right within Excel.
This report can then be manipulated either as an Excel pivot table (pictured below) or an OBIEE pivot table, giving you an OBIEE-within-Excel experience far more intuitive and usable than the earlier incarnation of SmartView.
Additional calculated fields can be added, in what is arguably a more obvious way than you’d do so in the Analysis Editor…
… and charts can be developed as well, using a similar set of of chart types to the ones provided by the Analysis Editor.
Then, once you’re done, you can either save the Excel (or Word, or Powerpoint, or whatever) document to your workstation’s filesystem, or you can upload to the BI Presentation Catalog using the Publish View button…
… and then – get this – open the report in the Analysis Editor, just like any other analysis in the catalog. Impressive stuff (although the calculation defined in Excel didn’t make it through to OBIEE, and the upload feature only seems to bring a single view at a time, but this is version 1.0)
There’s tons more to Smartview and in reality, presumably some of the new OBIEE stuff won’t work properly in this first release, but it’s a huge improvement over the old OBIEE MS Office plug-in, and it’s also useful being able to use the same MS Office plugin for all Oracle’s BI & EPM tools, with full 32 and 64-bit support for all the modern MS Office versions.
One of the major new feature areas in OBIEE 18.104.22.168, but which has so far got very little attention, is the significant improvement in integration between Essbase, the Hyperion EPM Suite, and OBIEE 11g. The integration between EPM Workspace and OBIEE’s Presentation Services which disappeared when 11g came along is now back, along with installation and security integration, a new version of SmartView that (properly) supports OBIEE as a data source, and the ability to spin-off aggregates from the RPD into Essbase ASO cubes.
Now some of these features of course made an appearance in the earlier, 22.214.171.124.2 BP1 release, and integration between OBIEE 11g and EPM Suite has been happening on-and-off right back from the OBIEE 10g days, but where we’re at now with OBIEE 126.96.36.199 is the delivery of a number of things that customers have long been asking for, including:
- The ability to run OBIEE from within EPM Workspace, with single sign-on between the two
- Shared security provisioning and organisation between Essbase and OBIEE, through application roles and policies
- The ability to install Essbase and the other EPM tools into the same WebLogic domain as OBIEE, using a single installer
- A proper Excel (and Word, Powerpoint, Outlook) add-in for OBIEE, with the ability to author reports as well as run existing Answers-authored ones
This is actually one of a number of new feature areas that came with 188.8.131.52 that have had little publicity; as well as better Essbase integration, there’s actually now support for multi-tenancy in the RPD and catalog, Hadoop integration (which we covered in a blog post last week), the View Suggestion Engine, the inevitable changes to MUD, and quite a few others, some of which I’ll try and cover in the next few days and weeks, but for now let’s look at these new Essbase/EPM integration improvements, starting with installation of Essbase and its related tools into the OBIEE WebLogic domain.
As I mentioned back in my OBIEE 184.108.40.206 New Features posting a few weeks ago, the OBIEE product installer now offers Essbase as an installation option alongside OBIEE, Real-Time Decisions (RTD) and BI Publisher. As with RTD, Essbase isn’t included in the base OBIEE+ license, but it is included in Oracle BI Foundation Suite, the product package that Oracle encourage new customers to take out an includes OBIEE, Scorecard & Strategy Management, Essbase and BI Mobile. Selecting Essbase during the install process installs it, and the other EPM Suite tools, in the same WebLogic domain as OBIEE, and you can see Essbase within Fusion Middleware Control as a product – separate from OBIEE – that you can manage and monitor.
Essbase Server, and Essbase Studio (the client/server tool used to design and build Essbase cubes) are also now controlled and monitored through OPMN, something that’s been a feature of EPM Suite for several releases now but which is, of course, new for OBIEE.
[oracle@obiee11117 ~]$ cd /home/oracle/obiee/instances/instance1/bin
[oracle@obiee11117 bin]$ ./opmnctl status
Processes in Instance: instance1
ias-component | process-type | pid | status
essbasestudio1 | EssbaseStudio | 12682 | Alive
essbaseserver1 | Essbase | 12685 | Alive
coreapplication_obiccs1 | OracleBIClusterCo~ | 12686 | Alive
coreapplication_obisch1 | OracleBIScheduler~ | 12687 | Alive
coreapplication_obijh1 | OracleBIJavaHostC~ | 12683 | Alive
coreapplication_obips1 | OracleBIPresentat~ | 12684 | Alive
coreapplication_obis1 | OracleBIServerCom~ | 12689 | Alive
So something that’s been an issue for EPM customers upgrading from OBIEE 10g to 11g was the removal, at the time, of the ability to integrate OBIEE’s Presentation Services within EPM Workspace, and the SSO link between the two products. Back with OBIEE 10.1.3.4 there was an admittedly complicated but supported and working process to integrate the two products together, allowing EPM Workspace customers to “skin” OBIEE to look like Workspace and run the two products together, albeit with separate report catalogs, security models and so forth.
This, coupled with the removal of OBIEE’s Hyperion custom authenticator for the RPD left many EPM Suite customers upgrading to OBIEE 11g in the lurch, leading to workarounds such as this one that we put together recently for one of our customers. Well this integration (mostly…) is back with OBIEE 220.127.116.11, so let’s see what it does, and what functionality is still missing compared to OBIEE 10g.
First off, Essbase and EPM Suite as installed as part of an OBIEE installation isn’t quite the same as EPM Suite installed standalone; most importantly, Essbase in this OBIEE incarnation has a different security model than “standalone” EPM Suite, in that it uses the same system of application roles and policies that the Fusion Middleware 11g-centric OBIEE 11g does, rather than the Shared Services and groups that standalone EPM Suite does. Also, the OBIEE 18.104.22.168 install installs just the following EPM Suite products:
- Essbase Server, including Essbase Agent, Essbase Studio, Essbase Administration Services, Provider Services
- Financial Reporting
- Calculation Manager
Therefore you don’t get Planning, Web Analysis and so forth, and you can’t subsequently install them into the domain and Fusion Middleware Control afterwards – so think of Essbase and the EPM Suite tools in this context as an add-on and complement to OBIEE, not a full installation of EPM Suite in their own right. Moreover, the majority of Essbase administration tasks which for standalone EPM Suite installs are performed through MaxL, Shared Services and EAS are performed through Fusion Middleware Control, and Essbase high-availability and clustering works different within this context, for example. The standard product architecture diagram for OBIEE and Essbase combined within the 22.214.171.124 release therefore gets updated, with a number of products added to the Java components, and System components part of the diagram, like this:
Now, when installed as part of OBIEE 126.96.36.199′s WebLogic domain, EPM Workspace is available at http://[machine_name:port]/workspace, and when you launch it you’re presented with a view into the BI Catalog, and menu options to administer the various EPM and BI tools from one place.
Within this catalog are both OBIEE objects such as analyses, dashboards and agents, and EPM objects such as Financial Reporting and SmartView reports.
There are limits to this EPM/BI Catalog integration though – FR reports, for example, can only be opened using the File > Open dialog in EPM Workspace, with an error message showing if you just click on the report itself in the BI Catalog view within EPM Workspace. But SSO between Workspace and OBIEE seems to work (as in, you don’t need to re-enter your BI password when clicking on an analysis in the Workspace Catalog view) as both OBIEE and EPM are working off of the same Fusion Middleware security model, which (the lack of) explains why the feature disappeared for so long after OBIEE 11g was introduced.
Now that OBIEE and Essbase share the same security, the need for the old HSS Custom Authenticator has now gone away, though of course this will only be of use if a customer has moved their Essbase installation into the OBIEE domain, with standalone EPM Suite installations still needing the security workaround mentioned earlier in this article. There’s no upgrade path from standalone EPM Suite installations to this integrated arrangement, so most probably any users of Essbase within this new 188.8.131.52 context will be installing it “net-new”, with the main objective being to enhance their existing BI setup rather than merging their separate BI and EPM platforms into one.
As you’ve probably picked-up by now, much of this new integration ability is down to security harmonised across both Essbase and OBIEE, or more accurately Essbase now having an option to use Fusion Middleware 11g security rather than Hyperion Shared Services. So what does Essbase and FMW11g security look like in practice? Let’s head over to Fusion Middleware Control, in particular the Application Policies administration screen, to take a look.
The big difference when Essbase runs as part of an Oracle BI domain is that authentication, and authorization for Essbase use Fusion MIddleware security rather than Shared Services or Native Essbase security. Although Essbase Administration Services ships with OBIEE 184.108.40.206, you should use Fusion Middleware Control to enable access to particular Essbase databases, and give permission to access tools such as Financial Reporting or Administration Services; the only security role for EAS and MaxL in this setup is to create the Essbase data and metadata filters; these filters are then assigned to users through FMW security resource permissions and application policies, which then are then granted to application roles and thereby to users.
Whilst this probably seems like an over-complicated nightmare to traditional Essbase users, it does have the major advantage that one set of application roles granted to users within a Fusion Middleware system can cover both OBIEE and Essbase permissions, and there’s no need to link to Shared Services or support Native Essbase security. We’ll cover the implications of this more in some future blog posts, but this is the enabling technology that makes the rest of this integration make sense.
With Essbase integrated into the OBIEE BI Domain, you can also now use Essbase as an aggregate persistence target, though this feature comes with the same (slightly strange) approach and limitations that we first encountered when it was first introduced with OBIEE 220.127.116.11.2 BP1; although there’s not the same requirement for the Essbase server only to be used for aggregate persistence, you still have to name the Essbase database in a particular way, it’s ASO-only, and the Aggregate Persistence Wizard still creates a separate ASO database for each aggregation (similar to Oracle Database materialised views) rather than one single cube covering all aggregations. In practical terms – I’m not sure how much you’d use this vs. creating your own Essbase cube in Studio against the whole RPD business area – but it might be useful for OBIEE developers who otherwise don’t know Essbase.
So finally, the other major Essbase-related new feature in OBIEE 18.104.22.168 is SmartView, the successor to Oracle BI Office. But that’s a topic in itself, so I’ll cover that this in the next posting.
OBIEE 22.214.171.124 came out a few weeks ago, and recently we looked at Hadoop integration, one of the more interesting new features in this release. Over the next week I’ll be looking in more detail at the changes around Essbase and EPM Suite integration, but today I’ll be looking at a preview release of the upcoming OBIEE 126.96.36.199 SampleApp, provided courtesy of Philippe Lions and the BI Tech Demos team within Oracle. Philippe will be going through the new SampleApp release at the upcoming Rittman Mead BI Forum 2013 events in Atlanta and Brighton, but in the meantime lets take at what’s likely to appear in this new OBIEE 188.8.131.52 demo showcase.
As with previous OBIEE 11g SampleApps, the dashboard front page lists out all of the content, and highlights in bright blue those areas that are new to this release. The 184.108.40.206 SampleApp is largely based on earlier releases to support the 220.127.116.11 and 18.104.22.168 OBIEE versions, with any new content either showing off 22.214.171.124 new features, or adding new functional areas to the SampleApp demo.
The best place to start looking is the New Features Demo dashboard, which highlights new 126.96.36.199 features such as performance tiles, 100% stacked bar charts and waterfall charts, on this first dashboard page:
Totals within tables, pivot tables and other visualisations can now have action links associated with them, to display a financial summary report for example:
Another page on this dashboard shows of the new layout capabilities within dashboard pages; object containers can now have fixed (absolute) width and height sizes, whilst dashboard columns, and rows/columns within table and pivot table views, can also be frozen whilst other areas scroll by.
The new 188.8.131.52 SampleApp is likely to ship with Endeca Information Discovery pre-installed, and configured to provide the catalog search for OBIEE’s BI Presentation Services (a new feature in OBIEE 184.108.40.206). The SampleApp 220.127.116.11 screenshot below shows a typical “faceted search” against the web catalog, displaying key attributes and an attribute search box via an Endeca Information Discovery-style guided navigation pane. The benefit of Endeca providing catalog search vs. the Presentation Server’s regular search feature is that it looks much deeper into the catalog metadata, allows searching across many more attributes, and because it uses an in-memory index, it’s really fast.
There’s also some nice new Oracle R Enterprise content and demos, including an example where R scripts can be selected via the dashboard, parameter values supplied and the scripts then run; ORE using OBIEE’s BI Repository as a data source, and some more examples of ORE analysing the Flight Delays dataset to predict delays on future flights, for example.
If you’re an Endeca Information Discovery developer who’s also interested in the state-of-play around OBIEE integration, there’s a whole dashboard setting out current examples around OBIEE / Endeca integration, including examples of parameter passing between OBIEE and Endeca Studio, Endeca using OBIEE’s BI Repository as its data source, and BI Publisher reporting against Endeca Server data via web service calls.
Finally, the dashboard pages added in to support DBA and developer tasks have been added to, with a new dashboard page for example displaying a list of all the physical SQL queries sent to the database.
Thanks to Philippe and the BI Tech Demos team for the early preview. Check back tomorrow when we’ll continue the look at what’s new with OBIEE 18.104.22.168, by taking a closer look at what’s changed, and dramatically improved, in the area of integration with Essbase and the Oracle EPM (Hyperion) Suite.
If you’ve not done so already, I’d advise anyone interested in OBIEE 11g to subscribe to the Oracle BI Tech Demos channel on Youtube, where Philippe Lions and his team showcase upcoming SampleApp releases, and highlight new features like SmartView integration (to be covered later this week on this blog), integration with Oracle R Enterprise, what’s coming in the new 22.214.171.124 SampleApp, and OBIEE 126.96.36.199 leveraging the Advanced Analytics Option in Oracle Database 11gR2. One demo that I’ve been aware of for some time via the Exalytics program, and that’s also featured in the Tech Demos channel, is a load test demo that uses scripts and internal OBIEE features to run the test, and is used by Oracle to show how many concurrent users an Exalytics server can handle.
What’s particularly interesting about this load test though is that it doesn’t require any external tools such as LoadRunner or JMeter, and the scripts it uses are actually shipped with the full v207 SampleApp VirtualBox VM that is downloadable from OTN. On a recent customer engagement a need came up for a “quick and dirty” load test for their system, so I thought I’d go through how this load test example works, and how it can be adapted for use with any other generic OBIEE 11g (188.8.131.52+) environment.
In the example used in the Youtube video, a report (which actually looks like a dashboard page, but is actually an single analysis compound layout containing two graph views, and a pivot table view) is set up with a special set of filter values; when requested, this analysis will use “randomised” filter values so that response times aren’t skewed by the same values being used each time, and a controlling process outside of the dashboard ramps up 10, 100, 200 and so on separate sessions up to a maximum of 2,000, to simulate the sort of user numbers that an Exalytics server might be required to support.
Then, when the load test is running, the metric display within Fusion Middleware Control is used to show how the server copes with the load (in terms of # of sessions, average response time per query etc), as well as a dashboard page based off of the usage tracking data that shows a similar set of information.
Now of course the reason this sort of test data is important (apart from selling Exalytics servers) is that a report that takes 10 seconds to run, on a system otherwise unused and with only you running queries, might take considerably longer to run when all your users are on the system, due to factors such as disk contention, queuing on database server and mid-tier server CPUs, parallel query getting scaled-back when more than a few users try to run reports at the same time, and so on – so you need to do this sort of load test before unleashing your new dashboards onto your user community. But performing a load test is hard – just ask our Robin Moffatt – so having a ready-made system shipped with SampleApp, that doesn’t require additional software, certainly sounds interesting. So how does it work?
The scripts that control the load test process are contained within the /home/oracle/scripts/loadtest folder on SampleApp, and look like this within the Linux file manager:
The folder actually contains three scripts, and a Java JAR archive file:
- runtest actually runs the load test
- users_list.txt is a set of usernames, that are central to the load test process (more on this in a moment)
- Loadtest_README.txt is the instruction file, and
- LoadTest.jar is a Java program that is called by runtest to log into OBIEE and request the report
Looking through the readme file, the way the process works is that you need to create a set of users with a common password within the OBIEE LDAP directory, and put their usernames in the users_list.txt file. Then, the LoadTest.jar file is called by the runtest script, passing the hostname and port number of the WebLogic server hosting Presentation Services, the path to the analysis that you wish to test against, and the common password, and the script will then initiate a session for each user and then run the report.
Looking at the list of names in the users_list.txt file is interesting, because they all appear to be airport three-letter codes; for example:
The reason for this becomes clear when you look at the filters behind the analysis that the runtest script calls; to provide the filter predicate randomisation, each run of the report uses the username to filter the origin airport selection, and the other filter values are generated through MOD and RAND functions that in essence, generate random values for each call of the report. So given that we’re not all going to want to test reports based on airport codes, and how the overall testing process works, this presents two challenges to us:
- How we generate a very large number of user accounts with a common password, given that the test process runs the report just once for each user, and how we get rid of these accounts once we’ve finished the testing.
- How we configure the report we want to test to generate “random” filter values – the approach Oracle took with this example is quite “clever”, but we’ll need to come up with something equally clever if we want to do this for our report.
Question 1 seems extricably linked to question 2, so let’s create an example report that we can easily randomise the values for, create a number of views that we can include in a compound layout as Oracle did in the load test demo, and give it a go.
Taking the SampleApp dataset and the A – Sample Sales subject area, let’s create an analysis that has the following columns in the analysis criteria:
- Products.P2 Product Type
- Time.T03 Per Name Qtr
- Time.T02 Per Name Month
- Customer.C3 Customer Type
- Ship To Regions.R50 Region
- Base Facts.1 – Revenue
- Base Facts.1 – Discount Amount
For good measure, create another derived measure, called Base Facts.1 – Gross Revenue, which uses the formula:
- “Base Facts”.”1- Revenue”+”Base Facts”.”3- Discount Amount”
and then create some views off of this criteria so that your analysis looks something along these lines:
Now comes the tricky part of randomising it. We could take the approach that Oracle took with the Airlines load test example and create, for example, a user for each country in the dataset, but instead let’s use Logical SQL’s RAND function to pick a region and calendar quarter at random, and then three of the five customer types, to use as the analysis filters. To do this, we create a filter against this column in the analysis criteria and then convert the filter to SQL, using something like the following SQL clause to filter the quarter randomly:
"Time"."T03 Per Name Qtr" in (
SELECT a.s_1 from
"A - Sample Sales"."Time"."T03 Per Name Qtr" s_1,
FROM "A - Sample Sales"
(BOTTOMN(RAND()*100,1) <= 1)
ORDER BY 1, 2 ASC NULLS LAST, 3 ASC NULLS LAST
FETCH FIRST 5000001 ROWS ONLY) a
The same goes for the region filter, which we define as:
"Ship To Regions"."R50 Region" in (
SELECT a.s_1 from
"A - Sample Sales"."Ship To Regions"."R50 Region" s_1,
FROM "A - Sample Sales"
(BOTTOMN(RAND()*100,1) <= 1)
ORDER BY 1, 2 ASC NULLS LAST, 3 ASC NULLS LAST
FETCH FIRST 5000001 ROWS ONLY) a )
whereas for the customer type filter, we return the top 3 ordered (random) values, not just the first one:
"A - Sample Sales"."Customers"."C3 Customer Type" in
( SELECT a.s_1
"A - Sample Sales"."Customers"."C3 Customer Type" s_1,
DESCRIPTOR_IDOF("A - Sample Sales"."Customers"."C3 Customer Type") s_2,
FROM "A - Sample Sales"
(BOTTOMN(RAND()*100,3) <= 3)
ORDER BY 1, 2 ASC NULLS LAST, 4 ASC NULLS LAST
FETCH FIRST 5000001 ROWS ONLY) a )
Now when you run the report you should see different filter selections being used each time you run it, similar to what's shown in the preview screenshot below.
One thing I noticed at this stage is, whilst the customer type filtering returned three values, only one would ever be used in the graph prompt, because that's how prompts in a view work vs. the multi-select prompts you get as dashboard prompts. So I then needed to move the customer type column from the prompts are to the Pies and Slices > Pies part of the graph layout (so I then got one pie chart per customer type, not just the one type I was seeing via the graph prompt before), so that my final report looked like this:
and my analysis criteria, including these special filters, looked like this:
Next we need to create an initial set of users so that we can perform the concurrency test. I do this by using the WebLogic Scripting Tool (WLST) script shown below which creates 30 users, assigns them to an LDAP group and then adds that group to the BIConsumers LDAP group, so that they can run the analysis in question (if you're new to WLST or are interested in reading a bit more about it, take a look at this Oracle Magazine of mine that explains the feature).
password = 'welcome1'
group = 'Loadtest-Users'
users = ['user1','user2','user3','user4','user5','user6','user7','user8','user9','user10','user11','user12',
for user in users:
After saving the WLST script to the /home/oracle/scripts/loadtest folder as create_users.py, I then go back to my Mac workstation and SSH into the SampleApp VirtualBox VM to run the script:
Last login: Sat Apr 20 12:58:38 on ttys000
markmacbookpro:~ markrittman$ ssh email@example.com
Last login: Sun Apr 21 17:13:37 2013 from 192.168.2.31
[oracle@obieesampleapp ~]$ cd obiee/Oracle_BI1/common/bin
[oracle@obieesampleapp bin]$ ./wlst.sh
Connecting to t3://localhost:7001 with userid weblogic ...
Successfully connected to Admin Server 'AdminServer' that belongs to domain 'bifoundation_domain'.
Warning: An insecure protocol was used to connect to the
server. To ensure on-the-wire security, the SSL port or
Admin port should be used instead.
Already in Config Runtime
Exiting WebLogic Scripting Tool.
Then using the same SSH session I create a new users_list.txt file containing the usernames of these 30 users (use CTRL-D in a Unix session to send the EOF signal to CAT, and stop copying text into the users_list.txt.new file)
[oracle@obieesampleapp bin]$ cd /home/oracle/scripts/loadtest/
[oracle@obieesampleapp loadtest]$ cat > users_list.txt.new
[oracle@obieesampleapp loadtest]$ mv ./users_list.txt users_list.txt.original
[oracle@obieesampleapp loadtest]$ mv ./users_list.txt.new users_list.txt
Finally, I then edit the runtest script to change the path to point to the analysis I created earlier, update the password setting for the users:
[oracle@obieesampleapp loadtest]$ vi ./runtest
so that the final runtest file looks like this:
[oracle@obieesampleapp loadtest]$ cat ./runtest
echo "Start time: `date`"
echo "Load Test Starting..."
java -jar LoadTest.jar "localhost" "7001" "/shared/loadtest/SampleAnalysis" "welcome1"
echo "Load Test Completed..."
echo "End time: `date`"
Now, we've got everything we need for the initial test; an analysis to run, a set of users to run it with, and the JAR file to perform the test. So let's give it a go...
[oracle@obieesampleapp loadtest]$ chmod a+x runtest
[oracle@obieesampleapp loadtest]$ ./runtest
Start time: Sun Apr 21 18:21:39 PDT 2013
Load Test Starting...
Creating User Sessions for Concurrency Test..
Total active sessions: 30
Total queries initiated: 30
Cleaning up User Sessions created for Concurrency Test..
- Remaining Active Sessions: 30
Completed User Sessions Cleanup
Load Test Completed...
End time: Sun Apr 21 18:21:54 PDT 2013
Where it gets interesting though is when you go over to Fusion Middleware Control, and view the DMS metrics graphs at Capacity Management > Metrics > View the full set of system metrics, where you can then see various metrics such as # of active sessions, request processing time (i.e. how long the analysis took to run), and # requests per minute.
But of course, our current test only runs thirty queries through our thirty users, so its not much of a concurrency test; also, I've got caching enabled, so I'd expect the figure to look fairly good (though this may be what we use in real-life, so the key thing is to make the test as close a reflection of your actual system as possible).
[oracle@obieesampleapp loadtest]$ wc -l users_list.txt
[oracle@obieesampleapp loadtest]$ cat users_list.txt users_list.txt users_list.txt users_list.txt users_list.txt users_list.txt users_list.txt users_list.txt users_list.txt users_list.txt >> big_users_list.txt
[oracle@obieesampleapp loadtest]$ wc -l big_users_list.txt
[oracle@obieesampleapp loadtest]$ cat big_users_list.txt big_users_list.txt big_users_list.txt big_users_list.txt > users_list.txt
[oracle@obieesampleapp loadtest]$ wc -l users_list.txt
FInally I run the test again, to simulate 1600 users running queries at once:
[oracle@obieesampleapp loadtest]$ ./runtest
Start time: Sun Apr 21 18:42:42 PDT 2013
Load Test Starting...
Creating User Sessions for Concurrency Test..
- Active Sessions: 100
- Active Sessions: 200
- Active Sessions: 300
- Active Sessions: 400
- Active Sessions: 500
- Active Sessions: 600
- Active Sessions: 700
- Active Sessions: 800
- Active Sessions: 900
- Active Sessions: 1000
- Active Sessions: 1100
- Active Sessions: 1200
- Active Sessions: 1300
- Active Sessions: 1400
- Active Sessions: 1500
- Active Sessions: 1600
Total active sessions: 1680
- Queries initiated: 100
- Queries initiated: 200
- Queries initiated: 300
- Queries initiated: 400
- Queries initiated: 500
- Queries initiated: 600
- Queries initiated: 700
- Queries initiated: 800
- Queries initiated: 900
- Queries initiated: 1000
- Queries initiated: 1100
- Queries initiated: 1200
- Queries initiated: 1300
- Queries initiated: 1400
- Queries initiated: 1500
- Queries initiated: 1600
Total queries initiated: 1680
Cleaning up User Sessions created for Concurrency Test..
- Remaining Active Sessions: 1680
- Remaining Active Sessions: 1600
- Remaining Active Sessions: 1500
- Remaining Active Sessions: 1400
- Remaining Active Sessions: 1300
- Remaining Active Sessions: 1200
- Remaining Active Sessions: 1100
- Remaining Active Sessions: 1000
- Remaining Active Sessions: 900
- Remaining Active Sessions: 800
- Remaining Active Sessions: 700
- Remaining Active Sessions: 600
- Remaining Active Sessions: 500
- Remaining Active Sessions: 400
- Remaining Active Sessions: 300
- Remaining Active Sessions: 200
- Remaining Active Sessions: 100
Completed User Sessions Cleanup
Load Test Completed...
End time: Sun Apr 21 18:45:34 PDT 2013
Going back over to EM, I can see the load building up on the server and the response time increasing.
Notice though how the response time actually starts to fall as more queries run? That's most probably caching kicking in, so next time I'll disable caching completely and run the test again. But for now though, this is the Oracle load test script running, and the steps I've outlined here should allow you to run a similar test yourself. Thanks to Phillipe and the Oracle BI Tech Demos team for this, and on a similar topic I'll be previewing the new v303 184.108.40.206 SampleApp in a posting tomorrow.
A few days back, I introduced our special guests for the Rittman Mead BI Forum in Atlanta, focusing first on Cary Millsap. Today I’d like to talk about our other special guest: Oracle ACE Director Alex Gorbachev. Alex was an inspiration for me back in the Oracle Database 10g and early 11g days when I was administering Oracle RAC for several data warehouse customers, and wondering whether RAC was the right platform for BI. Of course it was… and every time I read one of Alex’s blogs (he was quite a prolific blogger back then… we all were once upon a time) or saw him speak, I felt empowered to go take on Cache Fusion.
Alex joined Pythian in Canada as a DBA team lead in 2006. Just two years later, he moved to Australia to successfully startup Pythian Australia. In 2009, he returned to Canada and took up the mantle of Chief Technology Officer, a title he still holds today. He is a member of the distinguished OakTable Network (as is Cary Millsap… something I forgot to mention yesterday), and is a member of the Board of Directors of the Independent Oracle Users Group (IOUG). Alex founded the Battle Against Any Guess Party, a movement promoting scientific troubleshooting techniques. During his time in Australia he also founded Sydney Oracle Meetup, a vibrant local community of passionate Oracle professionals.
Its fortuitous that Mark blogged yesterday on Hadoop… as this is exactly what Alex is speaking on at the BI Forum. His presentation is titled “Hadoop versus the Relational Data Warehouse.” He’ll discuss some of the technical design principles of Hadoop and the reasons for it’s rise in popularity. We’ll get to see the position that Hadoop currently occupies in the enterprise data center, it’s possible future trajectory, and how that trajectory compares with the more traditional relational data warehouse. For the BI developers in the crowd who have perhaps never seen Alex speak… you’re definitely in for a treat. He’s set to speak first thing Friday morning to kick off the last day of the Forum. If you know Alex, you’re obviously aware that he’s an excellent technologist, but you also likely know how much fun he is to be around, so it will be good to have him at the social meet-ups in and around the conference.
I’d really like to thank our friend and business partner Pythian for always supporting Rittman Mead and ensuring that Alex would speak at the Forum. And of course… I’d be remiss if I didn’t say: Love Your Data!
In yesterday’s post I looked at the key enabling technologies behind OBIEE and ODI’s connectivity to Hadoop, and today I’ll look at how OBIEE 220.127.116.11 can now access Hadoop data sources through two related technologies; Hive, and MapReduce.
In my introduction to the topic I said that whilst writing MapReduce routines in Java, and then orchestrating them through other tools in the Apache Hadoop family could be quite complex technically, another tool called “Hive” provided an SQL-like query layer over Hadoop and MapReduce so that tools like OBIEE could access them. Rather than you having to write your own MapReduce routines in Java, for instance, Hive writes them for you, returning data to OBIEE and ODI via ODBC and JDBC drivers. The diagram below, also from yesterday’s post, shows the data layers used in such an arrangement.
Under the covers, Hive has its own metadata layer, server engine and data store, with developers “loading” data into Hive “tables” which are then generally stored on the HDFS file system, just like any other data processed through MapReduce. Then, when a query is issued through Hive, the Hive Server dynamically generates MapReduce routines to query the underlying data, returning data to users in a similar way to an interactive database SQL session, like this:
markmacbookpro:~ markrittman$ ssh oracle@bigdatalite
Last login: Wed Apr 17 04:02:59 2013 from 192.168.2.200
Welcome to BigDataLite
run startx at the command line for X-Windows console
Host: bigdatalite.us.oracle.com [192.168.2.35]
[oracle@bigdatalite ~]$ hive
Hive history file=/tmp/oracle/hive_job_log_oracle_201304170403_1991392312.txt
hive> show tables;
Time taken: 2.925 seconds
hive> select count(*) from src_customer;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
In order to limit the maximum number of reducers:
In order to set a constant number of reducers:
Starting Job = job_201303171815_0003, Tracking URL = http://localhost.localdomain:50030/jobdetails.jsp?jobid=job_201303171815_0003
Kill Command = /usr/lib/hadoop-0.20/bin/hadoop job -Dmapred.job.tracker=localhost.localdomain:8021 -kill job_201303171815_0003
2013-04-17 04:06:59,867 Stage-1 map = 0%, reduce = 0%
2013-04-17 04:07:03,926 Stage-1 map = 100%, reduce = 0%
2013-04-17 04:07:14,040 Stage-1 map = 100%, reduce = 33%
2013-04-17 04:07:15,049 Stage-1 map = 100%, reduce = 100%
Ended Job = job_201303171815_0003
Time taken: 22.21 seconds
In the example above, I connected to the Hive environment, listed out the “tables” available to me, and then ran a count of “rows” in the src_customers table which in the background, caused a MapReduce routine to be written and executed in the background by the Hive server. Hive has been described as the “Hadoop Data Warehouse”, but it’s not really a data warehouse as you and I would know it – you wouldn’t typically use Hadoop and Hive to store customer transaction data, for example, but you might use it as a store of Facebook interactions, for example, or most popular pages or interaction paths through your website, and someone working in Web Analytics might want to interactively query that set of data in a more user-friendly manner than writing their own Java routines. So how does OBIEE gain access to this data, and what extra software or configuration pieces do you need to put in-place to make it happen?
If you want to have OBIEE 11g access Hadoop data, you’re best going with the 18.104.22.168+ release as this is where it’s most tested and stable. You’ll need to configure drivers at two points; firstly at the server level (Hadoop access is only supported with Linux server installations of OBIEE 22.214.171.124) and then at the Windows-based Administration tool level. Let’s start with the BI Administration tool first, based on the instructions in the 126.96.36.199 Metadata Repository Builder’s Guide.
To have the BI Administration tool connect to a Hadoop/Hive data source, you’ll need to download some ODBC drivers for Hadoop via a My Oracle Support download, DocID 1520733.1. This gives you a set of HiveODBC drivers along with a PDF explaining the installation process, and once you’ve installed the drivers, you’ll need to open up the ODBC Data Source Administrator applet and create a new HiveODBC data source. In this instance, I call the datasource “bihdatalite” after the server name, and go with the default values for the other settings. Note that “default” is the name of the “database” within Hive, and the port number is the port that the Hive server is running on.
Now I can create a new repository offline, and connect to the Hive server via the HiveODBC connection to start importing table metadata into the RPD. Note that with the current implementation of this connectivity, whilst you can import tables from multiple Hive databases into the RPD, queries you issue can’t span more than a single Hive database (i.e. you can’t specify a schema name prefix for the table name, therefore can’t join across two schemas).
Then, once you’ve imported the Hive table metadata into the RPD, change the physical database type to “Apache Hadoop”, from the default ODBC 3.5 setting that would have been added automatically by the metadata import process. Leave the connection pool call interface at ODBC2.0, put in any old username and password into the shared login details (or a valid username/password if Hive security is enabled), and then save the repository.
You should then be able to use the View Data feature in the BI Administration tool to view data in a particular Hive table, like this:
Now you need to move over to the server part of OBIEE, and configure the ODBC connection to Hive there too. OBIEE 188.8.131.52 comes with DataDirect drivers already installed that will connect to Hive, so it’s just a case then of configuring a connection of the same name to the Hive datasource using OBIEE’s odbi.ini file, like this:
[ODBC Data Sources]
AnalyticsWeb=Oracle BI Server
Cluster=Oracle BI Server
SSL_Sample=Oracle BI Server
bigdatalite=Oracle 7.1 Apache Hive Wire Protocol
Description=Oracle 7.1 Apache Hive Wire Protocol
Note that you also need to configure OBIEE’s OPMN feature to use the DataDirect 7.1 drivers rather than the default, older ones – see the docs for full details on this step. Then, as far as the RPD is concerned, you just need to make a business model out of the Hive table sources, upload it using EM so that its running online on your OBIEE server installation, and your RPD in the end should look similar to this:
Then finally, you can create an OBIEE analysis using this data, and analyse it just like any other data source – except, of course, that there’s quite a lot of lag and latency at the start of the query, as Hive spins up its Java environment, writes the MapReduce query, and then send the data back to OBIEE’s BI Server.
So how do we get data into Hive in the first place, to create these tables that in the background, are access through Hadoop and MapReduce? Check back tomorrow, when I’ll look at how Oracle Data Integrator can be used to load data into Hive, as well as perform other data integration tasks using Hadoop and other big data technologies.
- I don't know everything
- I'm not the best developer in the world, but I constantly work at getting better...
- If I make a statement about something, that's been my experience. Your results may vary.
Recent releases of OBIEE and ODI have included support for Apache Hadoop as a data source, probably the most well-recognised technology within the “big data” movement. Most OBIEE and ODI developers have probably heard of Hadoop and MapReduce, a data-processing programming model that goes hand-in-hand with Hadoop, but haven’t tried it themselves or really found a pressing reason to use them. So over this next series of three articles, we’ll take a look at what these two technologies actually are, and then see how OBIEE 11g, and also ODI 11g connect to them and make use of their features.
Hadoop is actually a family of open-source tools sponsored by the Apache foundation that provides a distributed, reliable shared storage and analysis system. Designed around clusters of commodity servers (which may actually be virtual and cloud-based) and with data stored on the server, not on separate storage units, Hadoop came from the world of Silicon Valley social and search companies and has spawned a raft of Apache foundation sub-projects such as Hive (for SQL-like querying of Hadoop clusters), HBase (a distributed, column-store database based on Google’s “BigTable” technology), Pig (a procedural language for writing Hadoop analysis jobs that’s PL/SQL to Hive’s SQL) and HDFS (a distributed, fault-tolerant filesystem). Hadoop, being open-source, can be downloaded for free and run easily on most Unix-based PCs and servers, and also on Windows with a bit of mucking-around to create a Unix-like environment; the code from Hadoop has been extended and to an extent commercialised by companies such as Cloudera (who provide the Hadoop infrastructure for Oracle’s Big Data Appliance) and Hortonworks, who can be though of as the “Red Hat” and “SuSE” of the Hadoop world.
MapReduce, on the other hand, is a programming model or algorithm for processing data, typically in parallel. MapReduce jobs can be written, theoretically, in any language as long as they exposure two particular methods, steps or functions to the calling program (typically, the Hadoop Jobtracker):
- A “Map” function, that takes input data in the form of key/value pairs and extracts the data that you’re interested in, outputting it again in the form of key/value pairs
- A “Reduce” function, which typically sorts and groups the “mapped” key/value pairs, and then typically passes the results down to the line to another MapReduce job for further processing
Joel Spolsky (of Joel on Software fame, one of mine and Jon’s inspirations in setting up Rittman Mead) explains MapReduce well in this article back from 2006, when he’s trying to explain the fundamental differences between object-orientated languages like Java, and functional languages like Lisp and Haskell. Ironically, most MapReduce functions you see these days are actually written in Java, but it’s MapReduce’s intrinsic simplicity, and the way that Hadoop abstracts away the process of running individual map and reduce functions on lots of different servers , and the Hadoop job co-ordination tools take care of making sense of all the chaos and returning a result in the end, that make it take off so well and allow data analysis tasks to scale beyond the limits of just a single server..
I don’t intend to try and explain the full details of Hadoop in this blog post though, and in reality most OBIEE and ODI developers won’t need to know how Hadoop works under the covers; what they will often want to be able to do though is connect to a Hadoop cluster and make use of the data it contains, and its data processing capabilities, either to report against directly or more likely, use as an input into a more traditional data warehouse. An organisation might store terabytes or petabytes of web log data, details of user interactions with a web-based service, or other e-commerce-type information in an HDFS clustered, distributed fault-tolerant file system, and while they might then be more than happy to process and analyse the data entirely using Hadoop-style data analysis tools, they might also want to load some of the nuggets of information derived from that data in a more traditional, Oracle-style data warehouse, or indeed make it available to less technical end-users more used to writing queries in SQL or using tools such as OBIEE.
Of course, the obvious disconnect here is that distributed computing, fault-tolerant clusters and MapReduce routines written in Java can get really “technical”, more technical than someone like myself generally gets involved in and certainly more technical than you average web analytics person will want to get. Because of this need to provide big-data style analytics to non-Java programmers, some developers at Facebook a few years ago came up with the idea of “Hive”, a set of technologies that provided a SQL-type interface over Hadoop and MapReduce, along with supporting technologies such as a metadata layer that’s not unlike the RPD that OBIEE uses, so that non-programmers could indirectly create MapReduce routines that queried data via Hadoop but with Hive actually creating the MapReduce routines for you. And for bonus points, because the HiveQL language that Hive provided was so like SQL, and because Hive also provided ODBC and JDBC drivers conforming to common standards, tools such as OBIEE and ODI can now access Hadoop/MapReduce data sources and analyse their data just like any other data source (more or less…)
So where this leaves us is that the 184.108.40.206 release of OBIEE can access Hadoop/MapReduce sources via a HiveODBC driver, whilst ODI 220.127.116.11+ can access the same sources via a HiveJDBC driver. There is of course the additional question as to why you might want to do this, but we’ll cover how OBIEE and then ODI can access Hadoop/MapReduce data sources in the next two articles in this series, as well as try and answer the question as to why you’d want to do this, and what benefits OBIEE and ODI might provide over more “native” or low-level big data query and analysis tools such as Cloudera’s Impala or Google’s Dremel (for data analysis) or Hadoop technologies such as Pig or Sqoop (for data loading and processing). Check back tomorrow for the next instalment in the series.
In the previous two posts in this series, I looked at the product architecture for Oracle Enterprise Manager 12cR2 (EM12cR2) Cloud Control and the BI Management Pack, and how you registered OBIEE, TimesTen, Essbase and the DAC as targets for monitoring and managing. But what can you do with EM12cR2 and the BI Management Pack once you’ve set it all up, how well does it handle other related products such as Informatica and Siebel CRM, how customisable is it and what other tasks can it perform?
To start off, one of the questions we’ve been asked is whether, in a similar way to OBIEE and Oracle Portal, you can customise the EM web console display to include just those views that you’re interested in; to create, for example, a dashboard page for monitoring OBIEE that might include views on BI Server query throughput, GoldenGate activity, DAC ETL alerts and so on. The answer is – not quite – but there are some customisations and bookmarks that you can create which at least make it easier to navigate your way around.
When you first log into OEM12cR2, you’re presented with the standard Enterprise Summary view, which summarises a number of metrics across all targets in the EM repository.
You can, however, change this for a more focused view of a particular type of target, by selecting SYSMAN > Select My Home… (or whatever your logged-in user name is), and then selecting from the list of pre-defined target views presented on the Select Enterprise Manager Home page that’s then displayed.
If, for example, your primary responsibility was looking after OBIEE systems, you might choose to have the Middleware page as your homepage, so that all of the WebLogic farms are listed out on your front page.
You can also set individual pages in EM as “favorites”, so that they appear from the Favorites menu for quick access as shown in the screenshot below.
Something else that’s useful when you’ve got a number of similarly-named systems registered within your EM repository is to put them into groups. To create a group to hold my “demo” OBIEE systems, for example, I would select Targets > Groups from the web console menu, and then press the Create > Group button to bring up the Create Group page. Then, using the Search or Search by Criteria buttons I can refine the search to include, for example, just Fusion Middleware Farms, and then select the ones that I’d like to add to the new group.
You can also create “dynamic” groups as well, including all systems that have a “development” status in a group that updates over time, like this:
Once you’ve registered your systems, you can do all of the same things you did with the 10gR4 version of EM and the BI Management Pack, including view metrics over time rather than for just the time you’ve got the metric window open (to my mind, one of the most valuable features in EM vs. Fusion Middleware Control).
Metric thresholds can also be defined in a similar fashion to how they were in EM10gR4, with events that are then triggered by the threshold being exceeded to notify you, for example, when query response times exceed a certain number of seconds, or when the dashboard login page can’t be reached. Unfortunately the dashboard and scheduler reports that are included as part of the BI Management Pack can’t be turned into graphs, but like Fusion Middleware Control any of the standard metrics can be graphed, overlayed on the same server’s metrics for the previous day, or compared to another server’s metrics or a baseline.
Finally, another question we’re often asked is how many other systems EM12cR2 can monitor, either out-of-the-box, through paid-for official plugins, or through third party extensions? The first thing to be aware of then is what EM functionality is included “for free” as part of your database or middleware license and what functionality costs more, and the definitive place for this information is the Oracle® Enterprise Manager Licensing Information 12c Release 2 (18.104.22.168) doc on OTN; also from the web console you can select Setup > Management Packs > Show Management Pack Information to have EM highlight for you those menu items that require additional licensing beyond those included by default for licensed database or middleware customers. For example, in the example below the items annotated with “OBIM” would require an Oracle BI EE customer to purchase the BI Management Pack, whilst the others would be “free” to use by any BI customer.
As for what these management pack and plug-ins cost, again the definitive source is the Oracle Tech Price list, which changes from time to time but can always be found with a Google search for “oracle tech price list”. The price list as of the time of writing listed the BI Management Pack at $11,500/processor (based on the processors licensed for BI EE).
Note also with management packs that you generally – at least in the case of Oracle Database – need to license the appropriate database option as well, though plug-ins are generally free or at least provided as part of the main product cost, as is the case with TimesTen and Exadata. In terms of what features come out of the box and what ones require separate installation, you can check this by selecting Setup > Extensibility > Self Update and Plugins menu items, which show the downloaded and available agent versions, along with the various plugins that can be used immediately, or downloaded from Oracle’s support site, including ones for Siebel, below, and EMC’s SAN arrays.
There are also plug-ins available for download from third-party sites for targets such as ones for Informatica PowerCenter, VMWare VSphere and mySQL, with most of them gathered together at the Enterprise Manager Extensions Exchange, also on the Oracle website.
So there we are with our three-part look at EM12cR2 and the BI Management Pack. I’m over in Norway now for the Oracle User Group Norway conference, but check back soon for some new content on the 22.214.171.124 release of OBIEE 11g.
I feel like I’m introducing the Beatles… though I think Kellyn Pot’Vin calls them the “DBA Gods”. Today I’ll be talking about Cary Millsap, and tomorrow I’ll introduce our other special guest: Alex Gorbachev.
As many of you know, I grew up as a DBA (albeit, focusing on data warehouse environments) before transitioning to development… initially as an ETL developer and later as an OBIEE architect. I had three or four “heroes” during that time… and Cary Millsap was certainly one of them. His brilliant white paper “Why a 99%+ Database Buffer Cache Hit Ratio is Not Ok” changed my whole direction with performance tuning: it’s probably the first time I thought about tuning processes instead of systems. Many of you also know about my love of Agile Methodologies… a cause that Cary has championed of late, and is also the subject of an excellent white paper “Measure Once, Cut Twice”. This purposeful inversion of the title helps to remind us that many of the analogies we use for software design don’t compute… it’s relatively simple to modify an API after the fact, so go ahead and “cut”.
A brief bit of history on Cary. He’s an Oracle ACE Director and has been contributing to the Oracle Community since 1989. He is an entrepreneur, software technology advisor, software developer, and Oracle software performance specialist. His technical papers are quoted in many Oracle books, in Wikipedia, in blogs all over the world, and in dozens of conference presentations each month. He has presented at hundreds of public and private events around the world, and he is published in Communications of the ACM. He wrote the book “Optimizing Oracle Performance” (O’Reilly 2003), for which he and co-author Jeff Holt were named Oracle Magazine’s 2004 Authors of the Year. Though many people (Kellyn included) think of Cary as a DBA… Cary considers himself to be software developer first, but explains what he believes to be the reason for this misconception:
“I think it’s fair to say that I’ve dedicated my entire professional career (27 years so far) to the software performance specialization. Most people who know me in an Oracle context probably think I’m a DBA, because I’ve spent so much time working with DBAs (…It’s still bizarre to me that performance is considered primarily a post-implementation operational topic in the Oracle universe). But my background is software development, and that’s where my heart is. I built the business I own so that I can hang out with extraordinarily talented software researchers and designers and developers and write software that helps people solve difficult problems.”
Cary’s presentation is called “Thinking Clearly about Performance” and will be given at the end of the day on Thursday before we head over to 4th and Swift for the Gala dinner. His message for the BI developers in the audience is an encouraging one:
“My message at the Rittman Mead BI Forum is that, though it’s often counterintuitive, software performance actually makes sense. When you can think clearly about it, you generally make progress more quickly and more permanently than if you just stab at different possible solutions without really understanding what’s going on. That’s what this presentation called “Thinking Clearly about Performance” is all about. It’s the result of more than 25 years of helping people understand performance in just about every imaginable context, from the tiny little law office in east Texas to the Large Hadron Collider at CERN near Geneva. It’s the result of seeing the same kinds of wasteful mistakes: buying hardware upgrades in hopes of reducing response times without understanding what a bottleneck is, adding load to overloaded systems in hopes of increasing throughput.”
My experience is that BI and DW systems suffer more from the “fast=true” disease than do OLTP systems, but that could simply be perspective. I’m excited that a group of BI developers, the majority of which are reporting against a database of some kind, will get an opportunity to see Cary’s approach to problem solving and performance tuning. As Cary tells us:
“The fundamental problems in an OBIEE implementation are just that: fundamental. The solution begins with understanding what’s really going on, which means engaging in a discussion of what we should be measuring and how (of course, in the OBIEE world, Robin Moffatt’s blog posts come in handy), and it continues through the standard subjects of profiles, and skew, and efficiency, and load, and queueing, and so on.”
If you are interested in seeing Cary and all the other great speakers at this year’s BI Forum, you can go over to the main page to get more information about the Atlanta event, or go directly to the registration page so we can see you in Atlanta in May.
In the previous post in this series, we looked at what Oracle Enterprise Manager 12cR2 and the BI Management Pack could do for OBIEE 11g admins, and how it manages a number of Oracle products from the database through to Fusion Middleware and the ERP applications. In today’s post I’m going to look at how an OBIEE system (or “target”) is registered so that we can then use BI Management Pack features, and how you make use of new features in the BI Management Pack such as support for Essbase targets.
I’ll work on the assumption that you’ve already got EM12cR2 installed, either on 64-bit Windows or 64-bit Unix (my preference is 64-bit Oracle Linux 5, though all should work); if you’ve not got EM12cR2 installed or your on an earlier version, the software is available on OTN and you’ll also need a suitable, recent and patched-up database to store the EM repository. Once you’ve got everything installed we can now login and take a look around – note that there’s no separate BI Management Pack download; all functionality is included but you need to be aware of what’s extra-cost and what’s not – the licensing guide is your best reference here, but at a high-level there are some parts of EM12cR2 that all licensed BI customers can use, whilst other features require the BI Management pack – we’ll cover this in more details in tomorrow’s post.
Logging into EM12cR2 presents you with a summary of the health and status of your systems, and you can see from the pie chart on the left that some of my systems are up, some are down and so forth. The Targets menu at the top of the screen lets me view similar information for hosts, middleware installations, databases and so on. My EM12cR2 installation has a number of OBIEE and other systems already registered with it, all of which are on VMs of which only a few are currently powered up.
In this example, I’m going to add a new host to this list, which is actually an Exalytics demo VM containing OBIEE 126.96.36.199 and TimesTen. Later on, we’ll look at adding Essbase to the list of monitored targets, both in terms of Essbase integrated into an OBIEE 188.8.131.52 install, and standalone as a separate install; finally, we’ll see how the BI Apps DAC is registered, so we can view the progress of Informatica ETL runs into the BI Apps data warehouse.
As I mentioned in yesterday’s post, EM12cR2 monitors and manages other servers by installing management agents on them; to do this, it needs to connect to the server via SSH, in order to install the agents software on there. To enable this, you need to provide the login credentials for a user on that server with “sudo” (act as an administrator) privileges, and a number of other settings have to be enabled for this process to work; to check that all of these are in place, let’s open up a console session on the Exalytics server and see how it looks:
[oracle@exalytics ~]$ sudo vi /etc/hosts
[sudo] password for oracle:
oracle is not in the shudders file. This incident will be reported.
What happened here is that I tried to run a command as the superuser, and the system asked for my password, but it turns out that this user isn’t in the list of users authorised to act as the superuser. To fix this, I need to now actually log in as root, and then issue the command:
to open up a special version of “vi” used for editing the sudoers file, and then add the line:
oracle ALL=(ALL) ALL
to the end to enable the “oracle” user to use the sudo command. After doing this and trying the previous commands again, I can now use the sudo command. Let’s now move over to the EM12cR2 website and start the process of registering the host, and thereafter registering the various software components on the Exalytics server.
There are various automated and semi-automated ways of discovering candidate servers on your network, but for simplicity I just select Setup > Add Target > Add Targets Manually from the menu in to top right-hand corner of the screen, which allows me to add details of the host directly rather than let EM scan the network to find them.
The Add Targets Manually page is then displayed. I select Add Host Targets from the set of radio button options, and then press the Add Host … button.
I then type in the name of the host (or IP address), and select the platform type. Note that unless your target is Linux x64, you’ll probably need to download the required agent software for that platform before you perform this task, using the Self Update feature.
Then, type in the location on the remote server that you want to install the agent software to, and the login credentials of the user that you’ve just enabled for sudo access on that server.
EM12cR2 will then attempt to install the agent. If you’re lucky, it’ll install first time, but more likely you’ll see a message like the one in the screenshot below saying that there was a problem with the agent installation.
What this is saying is that you need to make some further changes to the “sudoers” file on the remote server before EM can properly use it to install the agent. There are usually actually two issues, and you hit the second one after fixing the first, so let’s tackle them both now. Going back over to the remote server and logging in as the “oracle” user, let’s use sudo again to fix the issue:
[oracle@exalytics ~]$ sudo /usr/sbin/visudo
The first bit to find in the file is the line that says (assuming Oracle Linux 5, as I’m using):
This setting means that all users trying to use the sudo command need to be actually logging in via the server’s own console, not remotely via SSH; by default this is the setting with Oracle Linux 5, so to change it to not require access through the console for users, I change it to:
While you’re there, also add the line:
to the file as well, as EM will complain about that not being set if you try and deploy again now. Once both of these are set, go back to EM, retry the deployment with the existing settings, and the agent should deploy successfully. Note also that if your OBEE installation is on a Windows server, you’ll need to install the cygwin Unix-like environment on the server before you do all this, to enable the SSH service and BASH command shell that EM requires – see these notes in the EM12cR2 docs for more details.
So at this point the management agent software will be deployed to the server, but none of the WebLogic, database or BI software will be registered with EM yet. To do this, on the screen that’s again displayed after you’ve registered the host itself, select Add Non-Host Targets using Guided Process (Also Adds Related Targets) option, then from the Target Type drop-down menu select Oracle Fusion Middleware, and then press the Add Using Guided Discovery… button to start the process by registering the WebLogic installation which in turn hosts OBIEE 11g.
When prompted, select the host that you’ve just installed the agent to as the WebLogic Administration Server Host, put in the web logic administration user details to connect to the admin server (not the OS user details you used earlier), leave the rest of the settings at their default values and press Continue.
If all goes to plan EM should then report a number of targets found in the scan – these are the WebLogic Server components, plus the BI components that we’re actually interested in.
On the next page, add any notes that you want to the target registration details, then press the Add Targets button add these to the EM repository.
On the Middleware targets page that is displayed once the targets are registered, you should see your WebLogic installation now listed, and if you drill into the Farm entry you’ll see the domain and the coreapplication entry that represents your Oracle instance. Click on the details for the farm, and you’ll then see something that looks familiar from Fusion Middleware Control – the view of your OBIEE installation, where you can also drill into core application and see details of your instance. We’ll cover more on what this screen can do in tomorrows, final post on this topic.
At this point our OBIEE system is mostly registered and configured, but we still need to register the repository database, so the dashboard and scheduler reports can work. To do this, select Business Intelligence Instance > Target Setup > Monitoring Credentials from the coreapplication drop-down menu, and then enter the details for that server’s BIPLATFORM schema, like this:
You should then be able to select Business Intelligence Instance > Dashboard Reports from the coreapplication drop-down menu to see details of which dashboards have run, what error messages were logged and so forth.
Note that this is a fairly minimal set of reports against usage tracking data – there’s no ability to graph the results, for example, and no ability to view individual report usage, just dashboards. But at least it’s something.
So that’s taken care of the OBIEE elements of the Exalytics server install. But what about TimesTen server that provides the in-memory database cache on the Exalytics server? TimesTen support doesn’t come out-of-the-box with EM12cR2, but you can enable it through a plug-in that you enable via EM12cR2′s “self-update” feature. To do this, from the Setup menu on the top-right hand side select Setup > Extensibility > Self Update, click on Plug-in in the list of folders that are then displayed, and then locate the Oracle TimesTen In-Memory Database entry in the Plug-in Updates listing that is then displayed. Assuming that its not been downloaded by someone else beforehand, click on it and press the Download button, to start the download process into EM’s software library.
After a short while the plug-in should be downloaded from Oracle’s support website, and you can then make it available for use with an agent. To do so, locate it in the list of agents again, click on it to select it, and then press the Apply button that’s next to the (greyed-out) Download button. You’ll then be taken to the Plug-ins page where you should use the Deploy On button to deploy it first to the Management Server (i.e. the EM12cR2 server) and then the Management Agent server (in this case, our Exalytics server) – note that you’ll need to know the SYS password for the database that holds your EM repository to do the OMS registration part.
If all goes to plan EM should then start the process of deploying the TimesTen plugin to the Management Server first, once its checked prerequisites and so forth. On my system, it also actually deployed the plug-in to the Exalytics server too, even though I don’t think I actually requested it.
The final configuration step now is to use the plug-in to register the TimesTen target on the Exalytics server. To do this I return to the main Setup menu in the EM web console, and select Setup > Add Target > Add Targets Manually, and then select the Add Non-Host Targets by Specifying Target Monitoring Properties radio button. Then, when the Target Type drop-down menu is displayed, select TimesTen In Memory Database 11g from the list, and then select the management agent that’s on the Exalytics server. Once done, press the Add Manually… to go on to the next stage of the target registration process.
Then, when prompted, enter the connection details to your TimesTen instance, as used on the Exalytics server.
And that should be it – the TimesTen server should be registered and then available as a target to view in EM. It’ll take a while for metrics to start getting collected and displayed in the various graphs, but you can take a look at what’s recorded and what actions you can take from the menu that’ll now appear when you view a TimesTen target.
For Essbase, how to register the Essbase server as a target depends on whether Essbase is installed standalone in just a WebLogic managed server (as it is with Oracle’s SampleApp demo VMs), installed alongside OBIEE 184.108.40.206 or 220.127.116.11.2 BP1 as part of a single BI domain, or installed in its own full WebLogic domain with a WebLogic Server administration server. If its installed standalone, the initial registration of the WebLogic domain on the server concerned won’t register the Essbase server, and instead you’ll need to register it manually afterwards in a similar manner to the TimesTen server. If you’ve installed Essbase along with OBIEE as part of the combined 18.104.22.168 install, it’ll get registered along with OBIEE, and be displayed underneath coreapplication as shown in the screenshot below. Finally, if Essbase has its own WebLogic domain, then it gets detected as a target type as part of that domain’s registration, the same way that OBIEE does when registering it’s WebLogic domain as a target.
Finally, the BI Apps Data Warehouse Administration Console (DAC) is registered similarly to Essbase and TimesTen, except that like Essbase the plug-in required for management is already included in most EM12cR2 installations. As the DAC isn’t associated with any particular middleware home (at least, not with BI Apps 22.214.171.124) you’ll need to find it within the general list of targets rather than with the OBIEE installation its linked with.
So, with all of this set up, what can you do with it? In the final part of this series tomorrow, we’ll look at some common questions and usage scenarios around EM12cR2 and the BI Management Pack, and try and answer some of these questions.
A few years ago I wrote a series of blog posts, and an OTN article, on managing OBIEE 10g using Oracle Enterprise Manager 10gR4 and the BI Management Pack, an extra-licensable option for OEM that provided additional management capabilities for OBIEE and the BI Apps Data Warehouse Administration Console. The BI Management Pack was reasonably popular at the time but disappeared with the move to Enterprise Manager 12c Cloud Control, but with the recent release of EM12cR2 it’s come back again, but now with additional capabilities around WebLogic, GoldenGate, TimesTen and Essbase. I covered the news of this new release a few months ago, and since then our customers are often asking about these new capabilities, but information on Oracle’s website and the web is pretty thin so I thought I’d go through it in a bit more detail, today talking about how the product works, tomorrow going through installation and configuration and then on the third day, covering some of the common requests and questions we’ve had from our own customers.
Unlike Oracle Enterprise Manager Fusion Middleware Control (or indeed Database Control, the equivalent for the Oracle Database), Enterprise Manager 12cR2 Cloud Control is designed to manage multiple target systems, not just the one that its installed on. What this means is that you can manage all of your BI domains from one place, along with all of your databases, your GoldenGate installation, the DAC, Essbase and so forth, with their details held in a management repository stored in an Oracle database. The diagram below shows a typical OEM12cR2 topology, with the OEM installation on a server connected to the repository database, and OBIEE and other BI “targets” installed on other servers in the organisation.
OEM is actually made up of two parts, and a database repository. The Oracle Management Service runs within a WebLogic domain and comprises of a Web Console (what we’d know as Enterprise Manager) and “Platform Background Services”, a set of background services that communicate with the target hosts and store the relevant information. The other part of OEM is the “Oracle Management Agent”, a server process thats installed on each monitored host that collects metrics to pass back to OMS and PBS, and executes tasks such as stopping and starting the target on behalf of OMS. OEM12cR2 Cloud Controls stores its metadata and monitoring data in a separate repository database, which can either be on the same server as OMS or on a separate machine – note that if you use a database instance that’s previously had Database Control enabled on it (as most of them have), you need to disable and remove it before you can use it for OEM’s own repository.
One of the main benefits of OEM12cR2 compared to standalone management consoles is that it manages the majority of Oracle’s server products – WebLogic Server, Oracle Database, Exadata, Exalogic, E-Business Suite and so on, though you need to read the small print as management covers more features in some products than others – we’ll get back to this point later on. At its best though, OEM12cR2 becomes your central monitoring point for all products (including some third party ones, via plugins), allowing you to monitor, manage, patch and maintain all of your servers from the one place.
As well as managing all hosts in one place, headline benefits of OEM12cR2 over “free” Fusion Middleware Control include:
- Monitor all BI Domains in one place, so you can see their versions, host types, patch levels etc
- Perform WebLogic lifecycle-type tasks such as patching the installation, packing and unpacking managed servers to move them between hosts, deploying test-to-production
- Define quality of service checks, create alerts for slow response times, hosts down etc
- Persist and store metrics, rather than only display them whilst you have the Metric screen open in your browser
Like the Oracle database though, Enterprise Manager comes with a number of extra-cost packs, including:
- Database Lifecycle Management Pack for Oracle Database
- Data Masking Pack
- Test Data Management Pack
- WebLogic Server Management Pack Enterprise Edition
- SOA Management Pack
and, of course, the BI Management Pack. So what do you get in the base version of OEM before you need to start paying for these packs? For all of the database, middleware and other targets, you can deploy agents, set alerts and define metric thresholds, and for Oracle Database specifically you can use the data movement features, view errors, use Advisor Central and so on, whereas the stuff you really want such as performance pages, wait event breakdowns and so on are extra cost. Same goes for WebLogic, with a small base-level set of functionality that’s pretty-much limited to discovering the WebLogic installation, then stopping and starting it, in other words what you get for “free” with Fusion Middleware Control. For BI, again you can display what you would normally see in Fusion Middleware Control (database and middleware licensed customers can use base-level Oracle Enterprise Manager at no extra license cost, so this would follow), but if you’re after anything else such as persisted metrics, service tests and so forth, figure on buying a few of the add-on management packs.
My article on OEM Grid Control 10gR4′s BI Management pack described the features that are still the core of OEM12cR2′s BI Management Pack, which at the time included the features below, as shown in the screenshot below.
- The ability to collect and record BI Server, BI Presentation Server, BI Cluster Controller and other BI target metrics, and define thresholds and events against those metrics
- The ability to connect to the BI repository database tables, to read for example the BI scheduler information about failed iBot executions and use it to alert you
- The ability to connect to the DAC repository, and then graph out ETL run information such as execution time, number of errors and so forth
- Record configuration settings, and then report on what’s changed for a target configuration compared to the previous settings
So now that the BI Management Pack is back with OEM12cR2, what do you get with it? Well you get everything that you had before, plus some new features:
- The ability to discover, and then monitor, Essbase installations
- All the new functionality around WebLogic (albeit with the requirement to license the WebLogic Management Pack)
- Compatibility with OBIEE 11g, along with continuing support for 10g
The screenshots below show some of these features in use, with the new EM12cR2 “Fusion” look and feel for the web console.
So how do we get EM12cR2 connecting to OBIEE, and make use of some of the new BI Management Pack features; also, how do we register an OBIEE installation with it, and how does it work with a BI Apps installation, or even about Exalytics? Come back tomorrow when we’ll cover off the installation and configuration parts of the product.
The deadline for your nomination is Tuesday, June 18th, 2013. Good luck and, if you win, let me know so I can see your application at Open World!
How can I store the result of <?xdofx:to_check_number(TOTAL_INV_AMOUNT,'USD','CASE_UPPER','DECIMAL_STYLE_WORDS')?> inside a variable.
Checking this out, BIP chokes on the assigning to the variable with a nice error:
Namespace prefix 'xdofx' used but not declared
Turning to BIP RTF template guru in residence Hok-Min, he suggested avoiding the xdofx: wrapper altogether in this case and calling the function more directly. The underlying function in java is:
public static String toCheckNumber(String locStr, String amount, String preOrCurCode, String caseStyle, String decimalStyle)
Applying that to Satyender's needs we end up with:
We still need the xdoxslt prefix but we can now assign the value to a variable. There is a caveat from Hok Min.
Note that the amount has to be in string format. If it is not a string, it has to be converted to a string, e.g. string($CALCULATED_SALARY). If you use XML element name directly (like in this case SALARY), then it is already a string, so no need to do conversion.
I know this raises the question of why do we need the xdofx: prefix at all? Im discussing that with Hok Min as I write and will get back to you.
At the end of last week I talked about what we had planned for the BI Forum in Brighton, and today I want to talk about what we’ve got planned for the Rittman Mead BI Forum 2013 in Atlanta, running the week after at the Georgia Tech Hotel & Conference Center on May 15th – 17th 2013.
As with the Brighton BI Forum event, we’ve got a mix of partner, independent developer, customer and Oracle speakers, all covering topics centered around OBIEE and its supporting technologies. The central idea around the BI Forum is that its “the conference we’d want to go to”, with content aimed at developers who already know the basics, want to hear solid technical and implementation talk not marketing fluff, and want to meet their friends and peers in the industry to share stories and trade tips. We keep numbers strictly limited, run just a single track so that everyone gets to take part in the same sessions, and maximize the networking and social elements so that you get to meet everyone who attends, hopefully staying in touch well after the event closes.
This year our speakers and sessions in Atlanta include:
- Jeff McQuigg talking about OBIEE testing (and making the topic interesting), Christian Screen on OBIEE plug-ins, Tim Vlamis covering OBIEE forecasting and time-series analysis, Kevin McGinley looking at ODI and the BI Apps (and hoping it’s GA by then, otherwise he’ll sing us Bohemian Rhapsody in Klingon for an hour), and our own Adam Seed will take us beyond the demos with Endeca
- From Oracle, we’ll have Marty Gubar and Alan Lee talking about the new Admin Tool, and OBIEE’s support for Hadoop as a data source, Florian Schouten will talk about BI Apps futures, and Jack Berkowitz will take us through the latest in OBIEE presentation, mobility and interactivity
Jack will also deliver the opening Oracle keynote on the Wednesday evening, and before that earlier in the day will be our optional masterclass, this year being delivered by Stewart Bryson, Michael Rainey and myself and focusing on Oracle’s data integration technologies. And – to top things off, we’re joined by two special guests, Method R’s Cary Millsap, and Pythian’s Alex Gorbachev, two of our friends from the Oracle database world and who’ll talk to us about reponse-time based performance tuning, and what’s new in the worlds of Big Data and unstructured analytics.
Of course – it’s not all about learning and technology, and we make a special effort around the social events usually to the point where we spend all the proceeds on free bars and making sure everyone’s having a good time. There’ll be a debate on the Friday, TED-style 10-minute sessions on the Thursday, a competition to see who can speak longer than Paul Rodwick about Oracle BI without blinking, full delegate packs and t-shirts to take home, and we share all delegate contact details amongst the attendees so you can stay in touch after everything closes. Registration is open and there are still a few places left, so if you’re thinking of attending and don’t want to lose your place, register now before we sell out…!
It’s going to be a busy few weeks leading up to the BI Forum. First, Rittman Mead will be exhibiting at the UKOUG Engineered Systems Summit on Tuesday 16th April, this is a one day event in London for Exadata, Exalogic, SuperCluster and not least Exalytics. Mark will be presenting on Oracle Exalytics – Tips and Experiences from Rittman Mead , full agenda available here. Mark will then hoping over to Norway to speak at the Oracle Norway User Group event on High-Speed, In-Memory Big Data Analysis with Oracle Exalytics, maybe he’ll be previewing his work getting OBIEE 126.96.36.199 working with Hadoop.
The following week on Tuesday 23rd April I am speaking at an Oracle Business Analytics event, I am giving a presentation about our story so far with Exalytics, this event is at Oracle’s City Office in London. Later that week on Thursday 25th, as part of Big Data Week I’m speaking in the evening in Brighton about the evolution from Business Intelligence to Analytics and Big Data, full agenda here, please register here.
With the BI Forum fast approaching, Toby Potter from DataSift agreed to an interview for the blog, to discuss social data, it’s value and how it can be combined with other business intelligence. Toby will be one of the guest speakers at the BI Forum in Brighton sharing more details and experiences in this growing area. Over to the interview…
[James Knight] “For the readers of the blog that may not have heard of DataSift, tell us a bit about the company and what they can provide.”
[Toby Potter] ”DataSift provides the leading Social Data Platform for enterprise. DataSift collects, structures and enriches unstructured social, news and blog data allowing our customers to filter and collect the data of interest to their business for further, more detailed analysis.
We provide real-time access to the full Twitter firehose, bit.ly (click) data, Facebook, YouTube, millions of blogs, news and others, complemented by our historical archive of data.”
[James Knight] ”Big data is a hot topic and lots of our customers want to do some ‘big data stuff’, but what does the term mean to you?”
[Toby Potter] ”Whilst we consider ourselves to be a Big Data Platform – our core platform is into the Petabytes and we’re growing at several Terabytes per day – our focus is really on the Right Data. Using the power of our filtering capabilities, we’re able to take those huge volumes of data and provide the Right Data to our customers based upon the rules they define.
Even given that we are processing 1 Trillion (1,000,000,000,000) items of social data a week in our historics platform – and doubling every 2 months.”
[James Knight] ”There’s a ton of social data out there, which may make people wary of the effort, timescales and costs involved. How can people start their journey in the use of social analytics and gain benefit quickly?”
[Toby Potter] ”Social analytics is actually pretty straightforward to get started on – for example, you can simply create a Pay As You Go account on the DataSift platform and be collecting data within a few miutes – but the challenge is often realising the insight that’s available within social data. A typical tweet for example, might have anywhere between 50 and 100 individual data points.
As with any analytics project, be clear on your objectives and you should see whether this is going to be of benefit very quickly. Think about how you would capture the data, what SLA (if any) do you need if this was to become mainstream, how would you analyse it and do you want to merge it with existing, internal sources.
The bottom line however, is to understand the action you would take from any insight gained. No matter how interesting the results, if there’s no action taken from the insight, you have to question the value of any analytics project.”
[James Knight] ”So far we’ve been talking about social media in relation to big data, but how can it also be used to enhance regular BI reports, such as those produced using Oracle BI EE?”
[Toby Potter] ”Typically, BI reports focus on reporting on data that’s available internally; sales data, customer data and effectiveness of a marketing campaign for example.
Social data in and of itself provides insight into the world outside; bringing this into your organisation’s Business Intelligence environment provides the business with a whole new perspective and delivers a new view of how your business is performing.
Both uses provide tremendous value, but it’s the combination of the two that really drives insight. Being able to link social discussions, or a particular news story, drove product interest and traffic to your web-site which both increased online sales, but also drove footfall to your physical stores (which also saw an uplift in sales) would provide the justification in running more similar campaigns. Understanding the social profiles of the customer base may also help you tune your offline messaging too.”
[James Knight] ”Out of those that you are allowed to talk about publicly, what’s the cleverest use of social data that you have seen so far?”
[Toby Potter] ”This is what I love most about my job; there are so many uses for the data that almost every discussion is different!
In the media business, I would point towards SecondSync as a great example of using social data to disrupt and add value to the fairly traditional business of measuring TV audiences. They capture social discussions around particular TV programmes and provide both a dashboard view and deeper analytics services to allow broadcasters to better understand their audience and how to engage with them.
News is an obvious area for analytics on social data and many of our customers are able to understand how stories are shared socially, enabling them to better target news stories, drive up audience and therefore drive increased advertising revenues.
Finance is another interesting area; the ability to bring together all of the discussion around particular investments, measure the sentiment and identify breaking news and the market reaction to it quickly, gives our customers an edge.
Customer services is another area and not just being able to react to complaints. Understanding what is working and what isn’t for your customers, collecting feedback and relating that to churn figures for example, allows you to understand how better to adapt your business. As an example, when the broadband goes down in a particular area, people tend to tweet about it very quickly; whilst your engineers may well be alerted, customer services are often left to pick up a sudden, increased load, without having any ideas as to why or what the problem might be. Integrating social data helps overcome this.
More broadly, by looking into the data itself, I’ve seen our customers collecting all of the geo-located tweets to understand population density to better inform telecoms infrastructure; retailers analyse data to understand future fashions to better inform stock ordering/promotions; competitor benchmarking to understand how your business compares to your peers.
The most innovative I’m seeing at the moment is the ability to build out profiles, much richer than simple demographics, of social customers to better understand their interests and so provide them with more useful promotions.
One of our customers, Local Response, have integrated a real-time social feed into their online advertising platform to drive “Intent Targetting”, the ability to target more appropriate and context sensitive advertising to web-site viewers.”
[James Knight] ”Social data can be quite messy. How can organisations uncover value?”
[Toby Potter] ”As a gross over simplification, there are two fundamental elements required: the ability to capture, process and cleanse the data in the first place (this is where DataSift comes in), and the ability to analyse it, which is where the Oracle platforms and Endeca in particular come into play.
Getting the data into a format where it can be analysed initially – for more traditional types of data this would typically be some kind of ETL tool – is key to being able to then store it ready for further analysis.
Endeca is fantastic at then taking this data and providing a flexible “data playground” allowing users to discover what insight might be hidden inside. For more structured analytics the more traditional BI tools provide a great environment to distribute reports around the business to a broader Business As Usual user base.”
[James Knight] ”Technically, how are companies dealing with social data, and do you see a standard approach or a variety of different approaches?”
[Toby Potter] ”As is to be expected in a relatively young space, there are many approaches. The majority of businesses are dipping their toes in and pulling data directly from the various provider APIs out there, but this provides a very distorted view of the potential; most APIs provide very restricted data sets and have limits on the number of requests you can make, so you’re potentially missing out on a lot.
Increasingly, as organisations recognise the importance and value of the data available, they are turning to specialist providers such as DataSift to provide a reliable, enterprise quality feed of the data they need.
We see the same journey often – initial experiment seems to yield value; build infrastructure to handle and incorporate data; deliver valuable insight … and then an API changes, or a major event changes volume levels, or the business interest grows and so on. The result is almost always frustration with the amount of expense, lack of reliability and general hard work that is required to get this working, but the insight is so valuable the need for a working solution over-rides everything else.
It is all of this work that DataSift aims to replace with a single platform to take away this maintenance and reliability nightmare.
If you’re looking at social data coming in to your business, I’d highly recommend looking at the tools that are already out there to save you a lot of time, money and frustration!”
[James Knight] ”At Rittman Mead, we’ve been working with Oracle Endeca Information Discovery (OEID), allowing us to combine unstructured, semi-structured and structure data. What additional value do you see OEID providing in the analysis of social data?”
[Toby Potter] ”This is exactly where the maximum value from the social data can be gained. Combining the data with existing sources, enhances the potential from the data, but having the right tools to extract this value is fundamental. Endeca provides exactly that kind of creative analytical environment where you can explore the data and determine where the most value lies.
Using Endeca, once the data is better understood, it’s then straightforward to productionise the areas of maximum benefit and focus energy on delivering that insight to where it can have most impact in your business.”
Many thanks to Toby for taking the time out for this interview, which we hope has provided some useful insight. There’s still time to register for the BI Forum and see Toby’s presentation at the Brighton event.
One of the key design features of the Exalytics In-Memory Machine is the use of aggregates (pre-calculated summary data), held in the TimesTen In-Memory database. Out of the box (“OotB”) these aggregates are built through the OBIEE tool, and when the underlying data changes they must be rebuilt from scratch.
For OBIEE (Exalytics or not) to make use of aggregate tables in a manner invisible to the user, they must be mapped into the RPD as additional Logical Table Sources for the respective Logical Table in the Business Model and Mapping (BMM) layer. OBIEE will then choose the Logical Table Source that it thinks will give the fastest response time for a query, based on the dimension level at which the query is written.
OBIEE’s capability to load aggregates is provided by the Aggregate Persistence function, scripts for which are generated by the Exalytics Summary Advisor, or the standard tool’s Aggregate Persistence Wizard. The scripts can also be written by hand.
Aggregate Persistence has two great benefits:
- It uses the existing metadata model of the RPD to understand where to get the source data for the aggregate from, and how to aggregate it. Because it uses standard RPD metadata, it also means that any data source that is valid for reporting against in OBIEE can be used as a source for the aggregates, and OBIEE will generate the extract SQL automagically. The aggregate creation process becomes source-agnostic. OBIEE will also handle any federation required in creating the aggregates. For example, if there are two source systems (such as Sales, and Stock) but one target aggregate, OBIEE will manage the federation of the aggregated data, just as it would in any query through the front-end.
- All of the required RPD work for mapping the aggregate as a new Logical Table Source is done automagically. There is no work on the RPD required by the developer.
However, there are two particular limitations to ‘vanilla’ Aggregate Persistence:
- It cannot do incremental refresh of aggregates. Whenever the underlying data changes, the aggregate must be dropped and rebuilt in entirety. This can be extremely inefficient if only a small proportion of the source data has changed, and can ultimately lead to scalability and batch SLA issues.
- Each time that the aggregate is updated, the RPD is modified online. This can mean that batch times take longer than they need to, and is also undesirable in a Production environment.
- Loading TimesTen aggregates through bespoke ETL, in tools such as GoldenGate and ODI. TimesTen supports a variety of interfaces – including ODBC and JDBC – and therefore can be loaded by any standard ETL tool. A tool such as GoldenGate can be a good way of implementing a light-touch CDC solution against a source database.
- Loading TimesTen aggregates directly using TimesTen’s Load from Oracle functionality, taking advantage of Aggregate Persistence to do the aggregate mapping work in the RPD
In both of these cases, there are downsides to the method. Using bespoke ETL is ultimately very powerful and flexible, but has the overhead of writing the ETL along with requiring manual mapping of the aggregates into the RPD. This mapping work is done in the TimesTen Load from Oracle method, but can only be used against an Oracle source database and where there is a single physical SQL required to load the aggregate.Refreshing aggregates using native OBIEE functionality alone
Here I present another alternative method for refreshing Exalytics aggregates, but using OBIEE functionality alone and remaining close to the OotB method. It is based on Aggregate Persistence but varies in two significant ways :
- Incremental refresh of the aggregate is possible
- No changes are made to the RPD when the aggregate is refreshed
The method still uses the fundamentals of Aggregate Persistence since , as I mentioned above, it has some very significant benefits:
- BI Server uses (dare I say, leverages), your existing metadata modelling work which is necessary – regardless of your aggregates – for users to report from the unaggregated data.
- BI Server generates your aggregate refresh ETL code
- If your source systems change, your aggregate refresh code doesn’t need to – just as reports are decoupled from the source system through the RPD metadata layers, so are your target aggregates
For us to understand the new method, a bit of background and explanation of the technology is required.Background, part 1 : Aggregate Persistence – under the covers
When Aggregate Persistence runs, it does several things:
- Remove aggregates from physical database and RPD mappings
- Create the physical aggregate tables and indexes on the target database, for the fact aggregate and supporting dimensions
- Update the RPD Physical and Logical (BMM) layers to include the newly built aggregates
- Populate the aggregate tables, from source via the BI Server to the aggregate target (TimesTen)
What we are going to do here is pick apart Aggregate Persistence and invoke just part of it. We don’t need to rebuild the physical tables each time we refresh the data, and we don’t need to touch the RPD. We can actually just tell the BI Server to load the aggregate table, using the results of a Logical SQL query. That is, pretty much the same SQL that would be executed if we ran the aggregate query from an analysis in the OBIEE front end.
The command to tell the BI Server to do this is the populate command, which can be found from close inspection of the
nqquery.log during execution of normal Aggregate Persistence:
populate "ag_sales_month" mode ( append table connection pool "TimesTen aggregates"."TT_CP") as select_business_model "Sales"."Fact Sales"."Sale Amount" as "Sale_Amoun000000AD","Sales"."Dim Times"."Month YYYYMM" as "Month_YYYY000000D0" from "Sales";
This populate <table> command can be sent by us directly to the BI Server (exactly in the way that a standard
create aggregate Aggregate Persistence script would be – with nqcmd etc) and causes it to load the specified table (using the specified connection pool) using the logical SQL given. The re-creation of the aggregate tables, and the RPD mapping, doesn’t get run:
The syntax of the populate command is undocumented, but from observing the
nqquery.log file it follows this pattern:
Looking at a very simple example, we can see how a simple aggregate with a measure summarised by month could be populated:
SELECT_BUSINESS_MODEL was written about by Venkat here, and is BI Server syntax allowing a query directly against the BMM, rather than the Presentation Layer which Logical SQL usually specifies. You can build and test the SELECT_BUSINESS_MODEL clause in OBIEE directly (from Administration -> Issue SQL), in
nqcmd, or just by extracting it from the
So, we have seen how we can take advantage of Aggregate Persistence to tell the BI Server to load an aggregate, from any source we’ve modelled in the RPD, without requiring it to delete the aggregate to start with or modify the RPD in any way.
Now, we need the a bit of secret sauce to complete the picture and make this method a viable one.
In side-stepping the full Aggregate Persistence sequence, we have one problem. The Logical SQL that we use in the populate statement is going to be parsed by the BI Server to generate the select statement(s) against the source database. However, the BI Server uses its standard query parsing on it, using the metadata defined. Because the aggregates we are loading are already mapped into the RPD then by default the BI Server will probably try to use the aggregate to satisfy the aggregate populate request (because it will judge it the most efficient LTS) – thus loading data straight from the table that we are trying to populate!
The answer is the magical INACTIVE_SCHEMAS variable. What this does it tell OBIEE to ignore one or more Physical schemas in the RPD, and importantly, any associated Logical Table Sources. INACTIVE_SCHEMAS is documented as part of the Double Buffering. It can be used in any logical SQL statement, so is easily demonstrated in an analysis (using Advanced SQL Clauses -> Prefix):
Forcing OBIEE query to use avoid a LTS, using INACTIVE_SCHEMAS. Click image for a larger version.
So when we specify the populate command to update the aggregate, we just include the necessary INACTIVE_SCHEMAS prefix:
SET VARIABLE INACTIVE_SCHEMAS='"TimesTen Aggregates".."EXALYTICS"': populate "ag_sales_month" mode ( append table connection pool "TimesTen aggregates"."TT_CP") as select_business_model "Sales"."Fact Sales"."Sale Amount" as "Sale_Amoun000000AD","Sales"."Dim Times"."Month YYYYMM" as "Month_YYYY000000D0" from "Sales";
Why, you could reasonably ask, is this not necessary in a normal OotB aggregate refresh? For the simply reason that in “vanilla” Aggregate Persistence usage the whole aggregate gets deleted from the RPD before it is rebuilt, and therefore when the aggregate query is executed there is only the base LTS is enabled in the RPD at that point in time.The final part of the puzzle – Incremental refresh
So, we have a way of telling BI Server to populate a target aggregate without rebuilding it, and we have the workaround necessary to stop it trying to populate the aggregate from itself. The last bit is making sure that we only load the data we want to. If we execute the populate statement as it stands straight from the
nqquery.log of the initial Aggregate Persistence run then we will end up with duplicate data in the target aggregate. So we need to do one of the following :
- Truncate the table contents before the
- Use a predicate in the
populateLogical SQL so that only selected data gets loaded
To issue a truncate command, you can use the logical SQL command
execute physical to get the BI Server to run a command against the target database, for example:
execute physical connection pool "TimesTen Aggregates"."TT_CP" truncate table ag_sales_month
This truncate/load method is appropriate for refreshing dimension aggregate tables, since there won’t usually be an update key as such. However, when refreshing a fact aggregate it is better for performance to use an incremental update and only load data that has changed. This assumes that you can identify the data and have an update key for it. In this example, I have an aggregate table at Month level, and each time I refresh the aggregate I want to load just data for the current month. In my repository I have a dynamic repository variable called THIS_MONTH. To implement the incremental refresh, I just add the appropriate predicate to the SELECT_BUSINESS_MODEL clause of the populate statement:
select_business_model "Sales"."Fact Sales"."Sale Amount" as "Sale_Amoun000000AD","Sales"."Dim Times"."Month YYYYMM" as "Month_YYYY000000D0" from "Sales" where "Dim Times"."Month YYYYMM" = VALUEOF("THIS_MONTH")
Making the completed aggregate refresh command to send to the BI Server:
SET VARIABLE DISABLE_CACHE_HIT=1, DISABLE_CACHE_SEED=1, DISABLE_SUMMARY_STATS_LOGGING=1, INACTIVE_SCHEMAS='"TimesTen Aggregates".."EXALYTICS"'; populate "ag_sales_month" mode ( append table connection pool "TimesTen aggregates"."TT_CP") as select_business_model "Sales"."Fact Sales"."Sale Amount" as "Sale_Amoun000000AD","Sales"."Dim Times"."Month YYYYMM" as "Month_YYYY000000D0" from "Sales" where "Dim Times"."Month YYYYMM" = VALUEOF("THIS_MONTH");
Since there will be data in the table for the current month, I delete this out first, using
execute physical connection pool "TimesTen Aggregates"."TT_CP" delete from ag_sales_month where Month_YYYY000000D0 = VALUEOF(THIS_MONTH);
The method I have described above is implemented in two parts:
- Initial build- only needs doing once
- Create Aggregate Persistence scripts as normal (for example, with Summary Advisor)
- Execute the Aggregate Persistence script to :
- Build the aggregate tables in TimesTen
- Map the aggregates in the RPD
- Create custom populate scripts:
nqquery.log, extract the full
populatestatement for each aggregate (fact and associated dimensions)
- Amend the INACTIVE_SCHEMAS setting into the
populatescript, specifying the target TimesTen database and schema.
- For incremental refresh, add a
WHEREclause to the
populatelogical SQL so that it only fetches the data that will have changed. Repository variables are useful here for holding date values such as current date, week, etc.
- If necessary, build an
execute physicalscript to clear down all or part of the aggregate table. This is run prior to the
populatescript to ensure you do not load duplicate data
- Aggregate refresh – run whenever the base data changes
- Optionally, execute the
execute physicalscript to prepare the aggregate table (by deleting whatever data is about to be loaded)
- Execute the custom
populatescript from above.
Because the aggregates are being built directly from the base data (as enforced by
INACTIVE_SCHEMAS) the refresh scripts for multiple aggregates could potentially be run in parallel (eg using xargs). A corollary of this is that this method could put additional load on the source database, because it will be hitting it for every aggregate, whereas vanilla Aggregate Persistence will build aggregates from existing lower-level aggregates if it can.
- Optionally, execute the
This method is completely valid for use outside of Exalytics too, since only the Summary Advisor is licensed separately. Aggregate Persistence itself is standard OBIEE functionality. For Exalytics deployed in an environment where aggregate definitions and requirements change rapidly then this method would be less appropriate, because of the additional work required to modify the scripts. However, for an Exalytics deployment where aggregates change less frequently, it could be very useful.
The approach is not without drawbacks. Maintaining a set of custom
populate commands has an overhead (although arguably no more so than a set of Aggregate Persistence scripts), and the flexibility comes at the cost of putting the onus of data validity on the developer. If an aggregate table is omitted from the refresh (for example, a support aggregate dimension table) then reports will show erroneous data.
The benefit of this approach is that aggregates can be rapidly built and maintained in a sensible manner. The RPD is modified only in the first step, the initial build. It is then left entirely untouched. This makes refreshes faster, and safer; if it fails there is just the data to tidy up, not the RPD too.
GoldenGate and Oracle Data Integrator – A Perfect Match… Part 2: Replicate to Staging and Foundation
It has been quite a hectic month since my last post, GoldenGate and Oracle Data Integrator – A Perfect Match Part 1: Introduction. Plenty of client work and delivery of two ODI 11g Bootcamp training courses have kept me busy. In the previous post, I described how the Oracle Reference Data Warehouse Architecture could be used to support a real-time data warehouse implementation. I also introduced the integration between ODI 11g and Oracle GoldenGate. In this post, I will describe the implementation and configuration of GoldenGate via ODI 11g at a high level. The detailed implementation has already been covered in a previous post, Extreme BI: Capturing Change with Oracle GoldenGate and ODI: Part 2 (Implementation), so here I will just focus on the highlights. In the final post in the series, I will switch the focus to ODI Change Data Capture (CDC), walking through several options to using CDC in downstream ETL processes.
Oracle GoldenGate and Oracle Data Integrator will be used to load each layer of the Oracle Reference Data Warehouse Architecture. GoldenGate is implemented to capture changes on the source and replicate data to the Staging layer (fully replicated source data) database tables. GoldenGate will also load the Foundation layer (transactional history) directly from the source, as the product is able to capture changes once, and then distribute the change rows to multiple locations in parallel without any additional impact to the source or target systems. The need for ETL mappings to incrementally load Foundation from Staging is eliminated, along with the added maintenance. Finally, ODI Interfaces are built to execute ETL mappings that will load the Access and Performance layer.
The key to the ODI and GoldenGate integration is a Journalizing Knowledge Module in ODI named “JKM Oracle to Oracle Consistent (OGG)”. The JKM, which will be applied to the Staging and Foundation Models, must be slightly modified to allow the capture of historical transactional data in the Foundation schema. The addition of the parameter “INSERTALLRECORDS” to the replicat parameter file loading the Foundation schema will apply every transaction as an insert into the target, regardless of whether it was an insert, update, or delete on the source. The addition of data warehouse audit columns, described below, will allow the tracking of transactional history in each table.
EDW_SCN (System Change Number)
EDW_COMMIT_TIMESTAMP (when the change was committed)
EDW_TRANS_TYPE (DML type for the transaction: insert, update, delete)
EDW_ROW_RANK (Intra-commit ranking for each transaction)
Journalizing within Oracle Data Integrator allows ETL developers to use the change records, stored in the J$ change tables, as the source of the ODI Interface. The JKM will setup GoldenGate processes to replicat the change data from the source to the change tables. When a change occurs in the source, the primary key and changed data are stored in the change table, along with the transaction DML type. Using the change data allows for near real-time processing of ETL mappings when loading downstream data warehouse tables.
To setup the integration between ODI and GoldenGate, begin by creating the ODI Models and Datastores for the Staging and Foundation tables. I will typically reverse engineer the source tables first, then copy them to the Staging and Foundation Models. This approach will ensure the column names and data types remain consistent with the source. I then execute a Groovy script to create the additional data warehouse audit columns in each of the Foundation Datastores. Now that we have the Models set, apply the customized “JKM Oracle to Oracle Consistent (OGG)” to the Staging and Foundation Models and set the options appropriately, as described in the aforementioned “Extreme BI: Capturing Change with GoldenGate…” blog post. From the tree structure under the Model, right-click each Datastore and choose “Add to CDC”. Then, choose “Start Journal” at the Model level. Based on the options set in the JKM, the Start Journal process will: create the ODI CDC framework (“J$” change tables, “JV$” change views, etc.) and generate the GoldenGate parameter files, configuration batch scripts (called Obey files), and instructions on how to complete the setup.
Browse to the temp directory indicated in the JKM options, open the Readme.txt file, and follow the instructions to complete the GoldenGate implementation. One important thing to note is that the Obey scripts generated by the JKM contain start commands for the extract, pump, and replicat processes. My recommendation is to comment out or remove these lines, as an initial load of source data to the target will need to be performed prior to starting replication.
For the initial load, first start the GoldenGate extract and pump processes on the source. Next, choose a batch load method such as Oracle Data Pump and move the source data to the target using the flashback query approach, as of a specific SCN. While GoldenGate does provide an initial load capability that can also be generated via the ODI “JKM Oracle to Oracle Consistent (OGG)”, it is very slow at moving the data from source to target and therefore not often used or recommended. Once the data has been loaded, start the replicat on the target after the SCN captured during the initial load process. This will ensure no transactions are duplicated or skipped.
GGSCI> start replicat ODIT1A aftercsn 123456
GoldenGate has been setup via ODI and is now replicating source changes to the Staging and Foundation schema tables. It will also load the “J$” change tables for use with ODI change data capture. In Part 3, the final post in the series, I will discuss how to meet some real-time ETL challenges using various development methods and Journalizing within Oracle Data Integrator.
GoldenGate and Oracle Data Integrator – A Perfect Match…
Since then, I've had an issue with my Nvidia drivers. Basically, I get video on a single monitor (dual set up) and that single monitor resolution is like 200 x 400 (no, it's not really that, but it is gigantic). Thank goodness for The Google Machine™. That originally led me here on StackOverflow. (Another reason to do things from the command line, you can remember things with history | grep nvidia).
I'm on the 4th time of going through this exercise. Each time the kernel is updated, nvidia breaks. Fortunately for me, that guy on StackOverflow gave me all the information I needed. This time after reboot and the gigantic screen, I removed the nvidia drivers and then reinstalled them. No go. uname -r gave me the following: 3.5.0-26-generic and dpkg -l|grep headers showed an older version of the kernel headers. So I updated those, reinstalled nvidia-current and rebooted. Yay.
Many "small" issues like this recently have me pondering a move back to, gasp, Windows or perhaps even a Mac. The Mac ecosystem scares me because it is expensive...but it's difficult to square when so many of my friends (technical and otherwise) swear by Macs. Something for another day I guess...