Skip navigation.

Feed aggregator

Data Integration Tips: ODI – Use Scenarios in packages

Rittman Mead Consulting - Tue, 2015-02-10 07:16

 

Here is another question I often hear during ODI Bootcamp or I often read in the ODI Space on OTN :
In a package, should I directly use mappings (or interfaces prior to 12c) or is it better to use the scenarios generated from these mappings?

An ODI package is a workflow used to sequence the different steps of execution of a data integration process. Some of these steps might change the value of a variable, evaluate a variable, perform an administrative task like sending an email or retrieving a file from an FTP. But the core thing is the execution of a mapping and this can be done in two different ways.

 

Direct Use

It is possible to directly drag and drop a mapping in the package.

Mapping in a Package

In this example the mapping M_FACT_SALES_10 is directly executed within the package, in the same session. This means that the value of the variables will be the same as the ones in the package.

Execution of a package holding a mapping

 

If there are several mappings, they will be executed sequentially, there is no parallelism here. (Damn, I will never get the double L at the right place in that word… Thanks autocorrect!)

The code for the mapping is generated using the current state of the mapping in the work repository, including any changes done since the package was created. But if a scenario is created for the package PCK_FACT_LOAD_MAP, it will need to be regenerated – or a new version should be generated – to take any change of M_FACT_SALES_10 into account.

One good point for this approach is that we can build a data integration process that will be entirely rolled back in case of failure. If one of the step fails, the data will be exactly as it was before the execution started. This can be done by using the same transaction for all the mappings and disable the commit option for all of them but the last one. So it’s only when everything succeeds that the changes will be committed and available for everyone. Actually it’s not even required to commit on the last step as ODI will anyway issue a commit at the end of a successful session. This technique works of course only if you don’t have any DDL statement on the target tables. You will find more information about it on this nice blog post from Bhabani Rajan : Transaction Control in ODI. Thanks to David Allan for reminding me this!

 

Using Scenarios

But you can also generate a scenario and then drag and drop it in a packageScenario in a package

Sessions and variables

In this case, the execution of the mapping will not be done within the package session. Instead, on step of this session will use the OdiStartScen command to trigger a new session to execute the scenario.

Execution of a package holding a scenario

We can see here that the second step of the package session (401) got only one task which will run the command to start a new session. The only step in the new session (402) is the mapping and it has the three same tasks as the previous example. Thanks to the fact this is a different session, you could choose to execute it using another context or agent. My colleague Michael brought an interesting use case to me. When one step of your package must extract data from a file on a file server that has its own standalone agent installed to avoid permission issues, you could specify that agent in the Scenario step of your package. So the session for that scenario will be started on that standalone agent while all the other sessions will use the execution agent of the parent session.

 

But what about variables configured to keep no history? As this is a different session, the values are lost outside of the scope of one session. We therefore need to pass it as startup parameters to the scenario of the mapping.

Passing variables as startup parameters To do this, I go to the Command tab on the scenario step and I add my variables there with the syntax

"-<PROJECT_CODE>.<VARIABLE_NAME>=<VALUE_TO_ASSIGN>"

where <VALUE_TO_ASSIGN> can be the value of variable in the current session itself (401 in our case).

 Code executed

The scenario step name was originally “Execution of the Scenario M_FACT_SALES_10 version 001″ and I slightly modified it to remove the mention of a version number. As I want to always execute the last version of the scenario, I also changed the version in the parameters and set it to -1 instead of 001. There is a small bug in the user interface so if you click somewhere else, it will change back to 001. The workaround is to press the Return key (or Enter) after typing your value.

So by using a scenario, you can choose which version of a scenario you want to execute. Either a specific version (e.g. 003) or the last one, using the value -1. It also means that if you won’t break anything if the package is executed while you are working on the mapping. The package will still use the frozen code of the scenario even if you changed the mapping.

If you create a scenario for the package PCK_FACT_LOAD_SCEN, there is no need to regenerate it when a new scenario is (re-)generated for the mapping. By using -1, it will reference the newly generated scenario.

 Asynchronous Execution

Another advantage of using scenarios is that it supports parallelism (Yes, I got it right!!).

Asynchronous scenario executions

Here I set the “Synchronous / Asynchronous” option to “Asynchronous Mode” for my two scenarios so the two sessions will start at the same time. By adding an OdiWaitForChildSessions step, I can wait for the end of all the sessions before doing anything else. This step will also define in which case you want to report an error. By default, it’s when all the child sessions are in error. But I usually change to parameter to zero, so any failure will cause my package execution to fail as well.

getPrevStepLog()

Just a short note, be careful when using the method getPrevStepLog() from the substitution API in a step after executing a scenario. That method will retrieve information about the previous step in the current Session. So about the ODI command execution OdiStartScen, and not about the execution of the scenario itself.

 

Summary

Here is a small recap table to compare both approach :

comparison table

In conclusion, development is generally more robust when using scenarios in packages. There is more control on what is executed and on the parallelism. The only good reason to use mappings directly is to keep everything within the same transaction.

Keep also in mind that Load Plans might be a better alternative than packages with better parallelism, error handling and restartability. Unless you need loops or a persistent session… A comparison between Load Plans and Packages can be found in the Oracle blog post introducing Load Plans.

 

More tips and core principles demystification coming soon so stay tuned. And follow @mRainey, @markrittman and myself – @JeromeFr – on twitter to be informed of anything happening in the ODI world.

Categories: BI & Warehousing

Some changes to be aware of, as Oracle Application Express 5 nears...

Joel Kallman - Tue, 2015-02-10 06:25
As the release of Oracle Application Express 5 gets closer, I thought it's worth pointing out some changes that customers should be aware of, and how an upgrade to Oracle Application Express 5 could impact their existing applications.


  1. As Trent Schafer (@trentschafer) noted in his latest blog post, "Reset an Interactive Report (IR)", there have been numerous customer discussions and blog posts which show how to directly use the gReport JavaScript object to manipulate an Interactive Report.  The problem?  With the massive rewrite to support multiple Interactive Reports in Oracle Application Express 5, gReport no longer exists.  And as Trent astutely points out, gReport isn't documented.  And that's the cautionary tale here - if it's not documented, it's not considered supported or available for use and is subject to change, effectively without notice.  While I appreciate the inventiveness of others to do amazing things in their applications, and share that knowledge with the Oracle APEX community, you must be cautious in what you adopt.
  2. In the rewrite of Interactive Reports, the IR component was completely revamped from top to bottom.  The markup used for IRs in APEX 5 is dramatically improved:  less tables, much smaller and more efficient markup, better accessibility, etc.  However, if you've also followed this blog post from Shakeeb Rahman (@shakeeb) from 2010, and directly overrode the CSS classes used in Interactive Reports, that will no longer work in IRs in APEX 5.  Your custom styling by using these classes will not have any effect.
  3. As the Oracle Application Express 5 Beta documentation enumerates, there is a modest list of deprecated features and a very small list of features which are no longer supported.  "Deprecated" means "will still work in APEX 5, but will go away in a future release of APEX, most likely the next major release of APEX".  In some cases, like the deprecated page attributes for example, if you have existing applications that use these attributes, they will still function as in earlier releases of APEX, but you won't have the ability to set it for new pages.  Personally, I'm most eager to get rid of all uses of APEX_PLSQL_JOB - customers should use SYS.DBMS_SCHEDULER - it's far richer in functionality.
Please understand that we have carefully considered all of these decisions - even labored for days, in some cases.  And while some of these changes could be disruptive for existing customers, especially if you've used something that is internal and not documented, we would rather have the APEX Community be made aware of these changes up front, rather than be silent about it and hope for the best.

Annonce : Remise du prix de la meilleure thèse

Jean-Philippe Pinte - Tue, 2015-02-10 03:12
Oracle France a remis à M. Gérald Patterson (ISEP / 2ième promotion du Master Cloud Computing) le trophée Oracle pour sa thèse intitulée Improving Cloud Computing availability with Openstack Enhanced Performance.
http://www.cloud-computing-formation.fr/rencontre/


Ce fut également l'occasion de présenter le aux élèves de l'ISEP.

Exception from executeScript in Alfresco Share

Yann Neuhaus - Tue, 2015-02-10 03:00


I didn't have the opportunity to post a new entry about Alfresco in this blog for a long time now, so I will fix this! In this blog entry, I will talk about a bug I encountered a few months ago. I resolved it but I, so far, not had the time to share my knowledge with you.
 

I. Description of the issue


This bug appears no matter what the version of Alfresco is used, regardless of the components that are installed, aso... So what is this bug? In fact, this bug isn't blocking anything. Actually it has no impact on the daily work, however, it fills up the Alfresco log files very quickly which can be problematic if you are an administrator searching for information in these log files! Indeed, each time a user accesses a page of Alfresco, between 10 and 50 Java Exceptions are generated (always the same), this create gigabytes log files in minutes/hours. Here is the exception I'm talking about:
 

...
Jul 08, 2014 10:42:16 AM org.apache.catalina.startup.Catalina start
INFO: Server startup in 95898 ms
2013-07-08 10:45:02,300 INFO [web.site.EditionInterceptor] [http-apr-8080-exec-1] Successfully retrieved license information from Alfresco.
2013-07-08 10:45:02,417 ERROR [extensions.webscripts.AbstractRuntime] [http-apr-8080-exec-3] Exception from executeScript - redirecting to status template error: 06080001 Unknown method specified to remote store API: has
  org.springframework.extensions.webscripts.WebScriptException: 06080001 Unknown method specified to remote store API: has
  at org.alfresco.repo.web.scripts.bean.BaseRemoteStore.execute(BaseRemoteStore.java:326)
  at org.alfresco.repo.web.scripts.RepositoryContainer$3.execute(RepositoryContainer.java:426)
  at org.alfresco.repo.transaction.RetryingTransactionHelper.doInTransaction(RetryingTransactionHelper.java:433)
  at org.alfresco.repo.web.scripts.RepositoryContainer.transactionedExecute(RepositoryContainer.java:495)
  at org.alfresco.repo.web.scripts.RepositoryContainer.transactionedExecuteAs(RepositoryContainer.java:533)
  at org.alfresco.repo.web.scripts.RepositoryContainer.executeScript(RepositoryContainer.java:276)
  at org.springframework.extensions.webscripts.AbstractRuntime.executeScript(AbstractRuntime.java:377)
  at org.springframework.extensions.webscripts.AbstractRuntime.executeScript(AbstractRuntime.java:209)
  at org.springframework.extensions.webscripts.servlet.WebScriptServlet.service(WebScriptServlet.java:118)
  at javax.servlet.http.HttpServlet.service(HttpServlet.java:722)
  at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:305)
  at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
  at org.alfresco.web.app.servlet.GlobalLocalizationFilter.doFilter(GlobalLocalizationFilter.java:61)
  at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
  at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
  at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:222)
  at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:123)
  at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:472)
  at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:168)
  at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:99)
  at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:929)
  at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)
  at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:407)
  at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1002)
  at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:585)
  at org.apache.tomcat.util.net.AprEndpoint$SocketProcessor.run(AprEndpoint.java:1813)
  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
  at java.lang.Thread.run(Thread.java:722)
...

 

The first time I encountered this exception, it was on an Alfresco v4.x installation, up and running for some years with a lot of extensions/customizations (mostly .AMP files). If you need more information about AMPs, it means Alfresco Module Package, it's the better way to extend Alfresco. Please take a look at some of my old blogs to find information about how to create this kind of stuff!
 

I had this exception on more than one Alfresco server, and because of that, I firstly thought that this exception came from an AMP... Therefore, I went through all these extensions but despite hours of research, I found nothing.


II. How to replicate the issue


I tried to replicate this issue with a fresh installation of Alfresco, same version, same extensions, aso... But I haven't been able to do so, at the first place. Finally, one day, I found out that there were something strange with the Alfresco servers on which the Java Exceptions appeared: the "Sites" folder weren't there. Indeed, after the installation of a new Alfresco server, the deletion of the default site ("Sample: Web Site Design Project") and the deletion of the "Sites" folder from the Repository browser, the exception appeared magically in the Alfresco log files... Now that we know from where this issue comes from, it's quite easy to replicate it:

  1. Install a new Alfresco server with the same version from the bundle executable/binaries (quicker)
  2. Start the Alfresco server and open the Alfresco Share UI (http://localhost:8080/share) using the admin account
  3. Navigate to the Sites finder (http://localhost:8080/share/page/site-finder)
  4. Click on "Search" to display all existing sites (only the default one is present: "Sample: Web Site Design Project")
  5. Click on "Delete" to delete the swsdp site
  6. Navigate to the Repository (http://localhost:8080/share/page/repository)
  7. Remove the "Sites" folder on the Repository page (/Company Home/Sites)
  8. Refresh the page and take a look at your logs


After doing that, you should be able to see a lot of exceptions like the one describe above. Issue replicated!


III. How to solve the issue


Being able to replicate an issue is good, but knowing how to solve it is better!

If the "Sites" folder has been deleted in the first place, it was because the Alfresco Sites weren't used at all. Therefore, the simplest solution to resolve this issue was to get the "Sites" folder back. But it's not that easy because this folder has a particular type, some particular attributes, aso... You can't just create a new folder, rename it "Sites" and hope that it will work ;). Starting from here, what you can do to solve this issue is:

  1. Restore the "Sites" folder using a backup
  2. Replicate the "Sites" folder from another Alfresco server


If you don't have a way to restore the "Sites" folder like it was my case (after some months, no backup left), here is what you can do to fix this issue:
 

Let's say that the Alfresco server, where the "Sites" folder doesn't exist anymore, is named "A". Please take a look at the end of this blog entry for some screenshots that may help you.

  1. Install a new Alfresco server with the same version as "A" from the bundle executable/binaries. This can be on your local machine. Let's name this Alfresco server "B"
  2. Start the Alfresco server "B" and open the Alfresco Share UI (http://localhost:8080/share) using the admin account
  3. Navigate to the Sites finder (http://localhost:8080/share/page/site-finder)
  4. Click on "Search" to display all existing sites (only the default one is present: "Sample: Web Site Design Project")
  5. Click on "Delete" to delete the swsdp site
  6. Navigate to the Repository (http://localhost:8080/share/page/repository) (DON'T delete the "Sites" folder)
  7. Configure a replication target on "B" to point to "A" (take a look at the Alfresco doc: http://docs.alfresco.com/4.1/tasks/adminconsole-replication-transfertarget.html)
  8. Enable the replication:
    1. Add the Alfresco Share url and the RepositoryId of "A" into the share-config-custom.xml file of "B" (take a look at the Alfresco doc: http://docs.alfresco.com/4.1/tasks/adminconsole-replication-lockedcontent.html)
    2. Add the "replication.enabled=true" into the alfresco-global.properties file of "B" (take a look at the Alfresco doc: http://docs.alfresco.com/4.1/tasks/replication-share.html)
    3. Restart "B" for the changes to be taken into account by Alfresco
  9. Configure a replication job on "B" to replicate the "Sites" folder from "B" to "A" (http://localhost:8080/share/page/console/admin-console/replication-jobs)
  10. Run the replication job on "B"


Configure the Replication Target on B (step 7 - create a folder named "TransfertToA" and edit its permissions):

CreateReplicationTarget.png


Find the Repository ID of A (step 8.1):

FindRepositoryId.png


Configure the share-config-custom.xml file of B (step 8.1):

EnableTheReplication.png


Once the replication job has run on "B", the exceptions will disappear from the log files of "A". I didn't go deeper so I don't really know if you can create new sites using this newly imported "Sites" folder but if you removed this folder in the first place, I would guess that you don't really need it ;).
 

Thank you for reading this post and I hope this will help. If you need more information, don't hesitate to let a little comment below. See you soon for more blogs!
 

 

A Sneak Preview of e-Literate TV at ELI

Michael Feldstein - Tue, 2015-02-10 00:58

By Michael FeldsteinMore Posts (1013)

Phil and I will be chatting with Malcolm Brown and Veronica Diaz about our upcoming e-Literate TV series on personalized learning in a featured session at ELI tomorrow. We’ll be previewing short segments of video case studies that we’ve done on an elite New England liberal arts college, an urban community college, and large public university. Audience participation in the discussion is definitely encouraged. It will be tomorrow at 11:45 AM in California C for those of you who are here at the conference, and also webcast for those of you registered for the virtual conference.

We hope to see you there.

The post A Sneak Preview of e-Literate TV at ELI appeared first on e-Literate.

Fronting Oracle Maven Repository with Artifactory

Steve Button - Mon, 2015-02-09 22:44
The JFrog team announced this week the release of Artifactory 3.5.1, which is a minor update that now works with the Oracle Maven Repository.

http://www.jfrog.com/confluence/display/RTF/Artifactory+3.5.1

I spent a little while yesterday having a look at it, working through the configuration of a remote repository and testing it with a maven project to see how it worked.

Once I'd downloaded it and started it up -- much love for simple and obvious bin/*.sh scripts -- it was a very simple two step process:

1. Since we live behind a firewall first add a proxy configuration to point at our proxy server.



2. Add a new remote repository and pointed it at the Oracle Maven Repository, specifying its URL and using my OTN credentials as username and password.


The Artifactory 3.5.1 documentation stated that the Advanced Settings >  Lenient host authentication and Enable cookie management options must be checked when accessing the Oracle Maven Repository.


The Test button is handy to verify the server settings have been entered correctly.

3. Use the Home tab > Client Settings > Maven Settings link to generate and save a settings.xml file that uses the artifactory server.



With the repository running, configured and the settings.xml saved, its then possible to try it out with an existing maven project such as https://github.com/buttso/weblogic-with-arquillian.

I also nuked my local repository to force/verify that the dependencies were fetched through the specified Artifactory server.

$ rm -fr ~/.m2/repository/com/oracle
$ mvn -s artifactory-settings.xml test

Viewing the output of the mvn process and the running Artifactory server you can see that maven is downloading dependencies from http://localhost:8081/artifactory and correspondingly Artifactory is downloading the requested artifact from https://maven.oracle.com.


Once the maven process has completed and all the requested artifacts downloaded, Artifactory will have cached them locally for future use.
 
Using the Search functionality of the Artifactory Web UI you can search for weblogic artifacts.


Using the Repository Browser functionality of the Artifactory Web UI you can view and navigate around the contents of the remote Oracle Maven Repository.

Nice JFrog > Artifactory team - thanks for the quick support of our repository.

One further thing I'd look at doing is enabling the Configure Passwords Encryption option in the Security settings to encrypt your OTN password, so that it's not stored in cleartext in the etc/artifactory.config.latest.xml file.


Goldengate – start replicat ATSCN or AFTERSCN ?

Michael Dinh - Mon, 2015-02-09 21:36

When using Goldengate to instantiate target database from an Oracle source database, replicat process can be started to concide with extract based the method used for instantiation, e.g. RMAN or datapump.

ATSCN is used to start replicat if RMAN is used to instantiate target.
From Database Backup and Recovery Reference, UNTIL SCN specifies an SCN as an upper limit.
RMAN restore or recover up to but not including the specified SCN.

AFTERSCN is used to start replicat if datapump is used to instantiate target.
The export operation performed is consistent as of FLASHBACK_SCN.

Hope this helps to clear up when to use ATSCN versus AFTERSCN.

Referemce:
Oracle GoldenGate Best Practices: Instantiation from an Oracle Source Database –  Doc ID 1276058.1


JavaScript Stored Procedures and Node.js Applications with Oracle Database 12c

Kuassi Mensah - Mon, 2015-02-09 20:44
JavaScript Stored Procedures and Node.js Applications with Oracle Database 12c                                       Kuassi Mensah                                    db360.blogspot.com | @kmensah | https://www.linkedin.com/in/kmensah
Keywords: JavaScript, Node.js, Java, JVM, Nashorn, Avatar.jsIntroduction                                                            Node.js and server-side JavaScript are hot and trendy; per the latest “RedMonk Programming Languages Rankings[1], JavaScript and Java are the top two programming languages. For most developers building modern Web, mobile, and cloud based applications, the ability to use the same language across all tiers (client, middle, and database) feels like Nirvana but the IT landscape is not a green field; enterprises have invested a lot in Java (or other platforms for that matter) therefore, the integration of JavaScript with it becomes imperative. WebSockets and RESTful services enable loose integration however, the advent of JavaScript engines on the JVM (Rhino, Nashorn, DynJS), and Node.js APIs on the JVM (Avatar.js, Nodyn, Trireme), make possible and very tempting to co-locate Java and Node applications on the same JVM.
This paper describes the steps for running JavaScript stored procedures[2]directly on the embedded JVM in Oracle database 12c and the steps for running Node.js applications on the JVM against Orace database 12c, using Avatar.js, JDBC and UCP.          
JavaScript and the Evolution of Web Applications Architecture                                   At the beginning, once upon a time, long time ago, JavaScript was a browser-only thing while business logic, back-end services and even presentations where handled/produced in middle-tiers using Java or other platforms and frameworks. Then JavaScript engines (Google’s V8, Rhino) leave the browsers and gave birth to server-side JavaScript frameworks and Node.js.Node Programming ModelNode.js and similar frameworks bring ease of development rapid prototyping, event-driven, and non-blocking programming model[3]to JavaScript. This model is praised for its scalability and good enough performance however, unlike Java, Node lacks standardization in many areas such as database access i.e., JDBC equivalent, and may lead, without discipline, to the so called “callback hell[4]”.
Nonetheless, Node is popular and has a vibrant community and a large set of frameworks[5].Node Impact on Web Applications ArchitectureWith the advent of Node, REST and Web Sockets, the architecture of Web applications has evolved into (i) plain JavaScript on browsers (mobiles, tablets, and desktops); (ii) server-side JavaScript modules (i.e., Node.js, ORM frameworks) interacting with Java business logic and databases.The new proposal for Web applications architecture is the integration of Node.js and Java on the JVM.  Let’s discuss the enabling technologies: JavaScript engine on the JVM and Node API on the JVM and describe typical use cases with Oracle database 12c.  JavaScript on the JVMWhy implement a JavaScript engine and run JavaScript on the JVM? For starters, i highly recommend Mark Swartz ‘s http://moduscreate.com/javascript-and-the-jvm/and Steve Yegge’s  http://steve-yegge.blogspot.com/2008/06/rhinos-and-tigers.htmlblog posts. In summary, the JVM brings (i) portability; (ii) manageability; (iii) Java tools; (iv) Java libraries/technologies such as JDBC, Hadoop; and (v) the preservation of investments in Java. 
There are several implementations/projects of Java based JavaScript engines including Rhino, DynJS and Nashorn.Rhino
First JavaScripe engine entirery written in Java; started at NetScape in 1997 then, became an open-source Mozilla project[6]. Was for quite some time the default JavaScript engine in Java SE, now  replaced by Nashorn in Java SE 8. DynJS
DynJS is another open-source JavaScript engine for the JVM. Here is the project homepage http://dynjs.org/. Nashorn
Introduced in Java 7 but “production” in Java 8[7], the goal of project Nashorn (JEP 174), is to enhance the performance and security of the Rhino JavaScript engine on the JVM. It integrates with javax.script API (JSR 223) and allows seamless interaction between Java and JavaScript (i.e., invoking Nashorn from Java and invoking Java from Nashorn).

To illustrate the reach of Nashorn on the JVM and the interaction between Java and JavaScript, let’s run some JavaScript directly on the database-embedded JVM in Oracle database 12c. JavaScript Stored Procedures with Oracle database 12c Using Nashorn
Why would anyone run JavaScript in the database? For the same reasons you’d run Java in Oracle database. Then you might ask why run Java in the database, in the first place? As discussed in my book[8], the primary motivations are: (i) reuse skills and code, i.e., which programming languages are your new hire knowledgeable of or willing to learn; (ii) avoid data shipping[9] i.e., in-place processing of billions of data/documents; (iii) combine SQL with foreign libraries to achieve new database capability thereby extending SQL and the reach of the RDBMS, e.g., Web Services callout, in-database container for Hadoop[10]. Some developers/architects prefer a tight separation between the RDBMS and applications therefore, no programming language in the database[11]but there are many pragmatic developers/architects who run code near data, whenever it is more efficient than shipping data to external infrastructure.
Co-locating functions with data on the same compute engine is shared by many programming models such as Hadoop. With the surge and prevalence of Cloud computing, RESTful service based architecture is the new norm. Data-bound services can be secured and protected by the REST infrastructure, running outside the RDBMS. Typical use case: a JavaScript stored procedures service would process millions/billions of JSON documents in the Oracle database and would return the result sets to the service invoker.
To conclude, running Java, JRuby, Python, JavaScript, Scala, or other programming language on the JVM in the database is a sound architectural choice. The best practices consist in: (i) partitioning applications into data-bound and compute-bound modules or services; (ii) data-bound services are good candidates for running in the database; (iii) understand DEFINER’s vs INVOKER’s right[12]and grant only the necessary privilege and/or permission. 

The Steps
The following steps allow implementing JavaScipt stored procedure  running in Oracle database; these steps represent an enhancement from the ones presented at JavaOne and OOW 2014 -- which consisted in reading the JavaScript from the database file system; such approach required granting extra privileges to the database schema for reading from RDBMS file system something not recommended from security perspective. Here is a safer approach:

1.      Nashorn is part of Java 8 but early editions can be built for Java 7; the embedded JavaVM in Oracle database 12c supports Java 6 (the default) or Java 7. For this proof of concept, install Oracle database 12c with Java SE 7 [13]2.      Build a standard Nashorn.jar[14]; (ii) modify the Shell code to interpret the given script name as an OJVM resource; this consists mainly in invoking getResourceAsStream()on the current thread's context class loader ; (iii) rebuild Nashorn.jar with the modified Shell 3.  Load the modified Nashorn jar into an Oracle database shema e.g., HR
 loadjava -v -r -u hr/ nashorn.jar4.      Create a new dbms_javascript  package for invoking Nashorn’s Shell with a script name as parameter
create or replace package dbms_javascript as
  procedure run(script varchar2);
end;
/
create or replace package body dbms_javascript as
  procedure run(script varchar2) as
  language java name 'com.oracle.nashorn.tools.Shell.main(java.lang.String[])';
end;
/
Then call dbms_javascript,run(‘myscript.js’)from SQL which will invoke Nashorn  Shell to execute the previously loaded myscript.js.5.  Create a custom role, we will name it NASHORN, as follows, connected as SYSTEM
SQL> create role nashorn;
SQL> call dbms_java.grant_permission('NASHORN', 'SYS:java.lang.RuntimePermission', 'createClassLoader', '' );
SQL> call dbms_java.grant_permission('NASHORN', 'SYS:java.lang.RuntimePermission', 'getClassLoader', '' );
SQL> call dbms_java.grant_permission('NASHORN', 'SYS:java.util.logging.LoggingPermission', 'control', '' );Best practice: insert those statements in a nash-role.sqlfile and run the script as SYSTEM6.      Grant the NASHORN role created above to the HR schema as follows (connected as SYSTEM):

SQL> grant NASHORN to HR;

7.      Insert the following JavaScript code in a file e.g., database.js stored on your client machine’s (i.e., a machine from which you will invoke loadjava as explained in the next step).
This script illustrates using JavaScript and Java as it uses the server-side JDBC driver to execute a PreparedStatement to retrieve the first and last names from the EMPLOYEES table.var Driver = Packages.oracle.jdbc.OracleDriver;
var oracleDriver = new Driver();
var url = "jdbc:default:connection:";   // server-side JDBC driver
var query ="SELECT first_name, last_name from employees";
// Establish a JDBC connection
var connection = oracleDriver.defaultConnection();
// Prepare statement
var preparedStatement = connection.prepareStatement(query);
// execute Query
var resultSet = preparedStatement.executeQuery();
// display results
     while(resultSet.next()) {
     print(resultSet.getString(1) + "== " + resultSet.getString(2) + " " );
     }
// cleanup
resultSet.close();
preparedStatement.close();
connection.close();

8.      Load database.js in the database as a Java resource (not a vanilla class)
loadjava –v –r –u hr/ database.js9.      To run the loaded scriptsqlplus hr/
SQL>set serveroutput on
SQL>call dbms_java.set_output(80000)
SQL>call dbms_javascript.run(‘database.js’);The Nashorn Shell reads ‘database.js’ script stored as Java Resource from internal table; the JavaScript in its turn invokes JDBC to execute a PreparedStatement and the result set is displayed on the console. The message “ORA=29515: exit called from Java code with status 0” is due to the invocation of java.lang.Runtime.exitInternal; and status 0 means normal exit (i.e., no error). The fix is to remove that call from Nashorn. Node.js on the JVMAs discussed earlier, Node.js is becoming the man-in-the-middle between Web applications front ends and back-end legacy components and since companies have invested a lot in Java, it is highly desirable to co-locate Node.js and Java components on the same JVM for better integration thereby eliminating the communication overhead. There are several projects re-implementing Node.js APIs on the JVM including: Avatar.js, Nodyn, and Trireme. This paper will only discuss Oracle’s Avatar.js.Project Avatar.js[15]The goal of project Avatar.js is to furnish “Node.js on the JVM”; in other words, an implementation of Node.js APIs, which runs on top of Nashorn and enables the co-location of Node.js programs and Java components. It has been outsourced by Oracle under GPL license[16]. Many Node frameworks and/or applications have been certified to run unchanged or slightly patched, on Avatar.js.
There are binary distributions for Oracle Enterprise Linux, Windows and MacOS (64-bits). These builds can be downloaded from https://maven.java.net/index.html#welcome. Search for avatar-js.jar and platform specific libavatar-js libraries (.dll, .so, dylib). Get the latest and rename the jar and the specific native libary accordingly. For example: on  Linux, rename the libary to avatar-js.so; on Windows, rename the dll to avatar-js.dll and add its location to your PATH (or use -Djava.library.path=).
RDBMSes in general and Oracle database in particular remain the most popular persistence engines and there are RDBMS specific Node drivers[17]as well as ORMs frameworks. However, as we will demonstrate in the following section, with Avatar.js, we can simply reuse existing Java APIs including JDBC and UCP for database access.
Node Programming with Oracle Database using Avatar.js, JDBC and UCP The goal of this proof of concept is to illustrate the co-location of a Node.js application, the Avatar.js library, the Oracle JDBC driver and the Oracle Universal Connection Pool (UCP) on the same Java 8 VM.The sample application consists in a Node.js application which performs the following actions: (i) Request a JDBC-Thin connection from the Java pool (UCP)(ii)Create a PreparedStatement object for “SELECT FIRST_NAME, LAST_NAME FROM EMPLOYEES”
(iii)Execute the statement and return the ResultSet in a callback
(iv)Retrieve the rows and display in browser on port 4000(v) Perform all steps above in a non-blocking fashion – this is Node.js’s raison d’être. The demo also uses Apache ab load generator to simulate concurrent users running the same application in the same/single JVM instance.For the Node application to scale in the absence of asynchronous JDBC APIs, we need to turn synchronous calls into non-blocking ones and retrieve the result set via callback.Turning Synchronous JDBC Calls into Non-Blocking CallsWe will use the following wrapper functions to turn any JDBC call into a non-blocking call i.e., put the JDBC call into a thread pool and free up the Node event loop thread.var makeExecutecallback = function(userCallback) {
 return function(name, args){
      ...
      userCallback(undefined, args[1]);
  }
} function submit(task, callback, msg) {
    var handle = evtloop.acquire();
    try {    var ret = task();
               evtloop.post(new EventType(msg, callback, null, ret)); {catch{}    evtloop.submit(r);
}
Let’s apply these wrapper functions to executeQuery JDBC call, to illustrate the conceptexports.connect = function(userCallback) {..} // JDBC and UCP settingsStatement.prototype.executeQuery = function(query, userCallback) {
         var statement = this._statement;
          var task = function() {
          return statement.executeQuery(query);
       }
     submit(task, makeExecutecallback(userCallback), "jdbc.executeQuery");
} Similarly the same technique will be applied to other JDBC statement APIs.Connection.prototype.getConnection = function() {…}Connection.prototype.createStatement = function() {..}Connection.prototype.prepareCall = function(storedprocedure) {..}Statement.prototype.executeUpdate = function(query, userCallback) {..}Returning Query ResultSet through a CallbackThe application code fragment hereafter shows how: for every HTTP request: (i) a connection is requested, (ii) the PreparedStatement is executed, and (iii) the result set printed on port 4000.   ...   var ConnProvider = require('./connprovider').ConnProvider;
var connProvider = new ConnProvider(function(err, connection){.. });

var server = http.createServer(function(request, response) {
  connProvider.getConn(function(name,data){..});     
  connProvider.prepStat(function(resultset) {
                while (resultset.next()) {
                   response.write(resultset.getString(1) + " --" + resultset.getString(2));
                   response.write('
');
    }
    response.write('');
    response.end();   
}
server.listen(4000, '127.0.0.1');
Using Apache AB, we were able to scale to hundreds of simultaneous invocations of the Node application. Each instance grabs a Java connection from The Universal Connection Pool (UCP), executes the SQL statements through JDBC then return the result set via a Callbak on port 4000.ConclusionsThrough this paper, i discussed the rise of JavaScript for server-side programming and how Java is supporting such evolution; then – something we set out to demonstrate – furnished step by step details for implementing and running JavaScript stored procedures in Oracle database 12c using Nashorn as well as running Node.js applications using Avata.js, Oracle JDBC, UCP against Oracle database 12c.As server-side JavaScript (typified by Node.js) gains in popularity it’ll have to integrate with existing components (COBOL is still alive!!). Developers, architects will have to look into co-locating JavaScript with Java, across middle and database tiers.



[1] http://redmonk.com/sogrady/2015/01/14/language-rankings-1-15/ [2] I’ll discuss the rationale for running programming languages in the database, later in this paper. [3] Request for I/O and resource intensive components run in separate process then invoke a Callback in the main/single Node  thread, when done. [4] http://callbackhell.com/ [5] Search the web for “Node.js frameworks[6] https://developer.mozilla.org/en-US/docs/Mozilla/Projects/Rhino [7] Performance being one of the most important aspect [8] http://www.amazon.com/exec/obidos/ASIN/1555583296 [9] Rule of thumb: when processing more than ~20-25% of target data, do it in-place, where data resides (i.e., function shipping). [10] In-database Container for Hadoop is not available, as of this writing. [11] Other than database’s specific procedural language, e.g.,  Oracle’s PL/SQL [12] I discuss this in chapter 2 of my book; see also Oracle database docs. [13] See Multiple JDK Support in http://docs.oracle.com/database/121/JJDEV/E50793-03.pdf [14] Oracle does not furnish a public download of Nashorn.jar for Java 7; search “Nashorn.jar for Java 7”. [15]  https://avatar-js.java.net/ [16] https://avatar-js.java.net/license.html [17] The upcoming Oracle Node.js driver was presented at OOW 2014. 

EMEA Exadata, Manageability & Hardware Partner Forum

Oracle EMEA Exadata, Manageability, Servers & Storage Partner Community Forum ...

We share our skills to maximize your revenue!
Categories: DBA Blogs

Oracle Maven Repository - Viewing Contents in Eclipse

Steve Button - Mon, 2015-02-09 16:18
With the Oracle Maven Repository now accessible one way to have explore its contents is to use the Maven Repositories viewer feature available in most development tools. I've seen the repository contents displayed easily in NetBeans so I decided to take a look at what it looks like in Eclipse as well.

I had to make a few minor setting changes to get it to work so decided to document them here.  If you've gotten it to work with less setting changes, let me know!

As initial setup, I configured my local maven environment to support access to the Oracle Maven Repository.  This is documented here https://maven.oracle.com/doc.html.  I also installed maven-3.2.5 that includes the updated Wagon module that supports authentication.

Next I downloaded and used the new network installer that the Oracle Eclipse team has published on OTN to install the latest version of Oracle Enterprise Pack for Eclipse.



This network installer lets developers select the version of Eclipse to install and the set of Oracle extensions --  Weblogic, GlassFish and other stuff -- to add in to it.

 Once Eclipse is installed, you can add the Maven Repository viewer by selecting   Window > Show View > Other > Maven Repositories from the Eclipse toolbar.



I also added a Console > Maven viewer to see what was happening under the covers and arranged them so they were visible at the same time:


With the Maven views ready to go, expand the Global Repositories node. This will show Maven Central (any other repositories you may have configured) and the Oracle Maven Repository if you have configured it correctly in the settings.xml file.

The initial state of the Oracle Maven Repository doesn't show any contents indicating that its index hasn't been downloaded to display.

Right mouse clicking on it and selecting the Rebuild Index option causes an error to be shown in the console output indicating that the index could not be accessed.


To get it to work, I made the following changes to my environment.  
Configure Eclipse to Use Maven 3.2.5Using the Eclipse > Preferences > Maven > Installation dialog, configure Eclipse to use Maven 3.2.5.  This is preferred version of Maven to use to access the Oracle Maven Repository since it automatically includes the necessary version of the Wagon HTTP module that supports the required authentication configuration and request flow.


Configure Proxy Settings in Maven Settings File ** If you don't need a proxy to access the Internet then step won't be needed **
 
If you sit behind a firewall and need to use a proxy server to access public repositories then you need to configure a proxy setting inside the maven settings file.

Interestingly for command line maven use and NetBeans a single proxy configuration in settings.xml was enough to allow the Oracle Maven Repository to be successfully accesses and its index and artifacts used.

However with Eclipse, this setting alone didn't allow the Oracle Maven Repository to be accessed.  Looking at the repository URL for the Oracle Maven Repository you can see ity's HTTPS based -- https://maven.oracle.com and it appears for Eclipse that a specific HTTPS based proxy setting is required for Eclipse to access HTTPS based repositories.


Rebuild Index SuccessWith the settings in place, the Rebuild Index operation succeeds and the contents of the Oracle Maven Repository are displayed in the repository viewer.



Have your say ...

Tim Dexter - Mon, 2015-02-09 15:25

Another messaging exchange last week with Leslie ...

OK, so we practised it a bit after our first convo and things got a little cheesy but hopefully you get the message.

Hit this link and you too can give some constructive feedback on the Oracle doc for BI (not just BIP.) I took the survey; its only eight questions or more if you want to share more of your input. Please take a couple of minutes to help us shape the documentation of future. 

Categories: BI & Warehousing

Webcast: Delivering Next-Gen Digital Experiences - 2/12

WebCenter Team - Mon, 2015-02-09 15:06
Oracle Corporation Digital Strategies For Customer Engagement Growth Automating Marketing & Customer Engagement

Becoming a digital business is imperative for organizations to deliver the next wave of revenue growth, service excellence and business efficiency. And the stakes are high -- 94% of customers discontinue communications because of irrelevant messages and experiences.

Join this webcast for an in-depth look at technologies that enable IT leaders to connect digital customer experiences to business outcomes. Learn how to:
  • Deliver omni-channel experiences that are seamless, tailored and innovative across Paid, Owned and Earned media
  • Convert unknown audiences to known and involved customers
  • Extend reach and orchestrate engagement across all channels and devices
  • Move Marketing from silo’d technologies to a single Digital Experience Platform that also connects Marketing to the entire organization
Register now for this webcast.

Red Button Top Register Now Red Button Bottom Live Webcast Calendar February 12, 2015
10 a.m. PT / 1 p.m. ET Featured Speaker:

Chris Preston Chris Preston,
Sr. Director
Customer Strategies
Oracle Hardware and Software Engineered to Work Together Copyright © 2015, Oracle Corporation and/or its affiliates.
All rights reserved.
Contact Us | Legal Notices | Privacy

Video Tutorial: XPLAN_ASH Active Session History - Part 3

Randolf Geist - Mon, 2015-02-09 15:04
The next part of the video tutorial explaining the XPLAN_ASH Active Session History functionality continuing the actual walk-through of the script output.

More parts to follow.

Long parse time with interval partitioning and subpartitions

Bobby Durrett's DBA Blog - Mon, 2015-02-09 13:52

In an earlier post I described how some queries with long parse times were causing long library cache lock waits.  Friday I applied a patch from Oracle that resolved the long parse times.  Here are the conditions which may relate to this bug:

  1. Interval partitioned table
  2. Partitioned by range
  3. Sub-partitioned by list
  4. Open ended range on partitioning column in where clause
  5. Tens of thousands of sub-partitions

Prior to applying the patch I did an explain plan on two versions of the problematic query.  One version specified an open-ended range on the partitioning column and the other closed off the range.

Slow version:

...
T392658.CLNDR_DT >= TO_DATE('2014-11-17' , 'YYYY-MM-DD') and 
...
Explained.

Elapsed: 00:00:46.20

Fast version:

...
T392658.CLNDR_DT >= TO_DATE('2014-11-17' , 'YYYY-MM-DD') and
T392658.CLNDR_DT <= TO_DATE('2014-11-26' , 'YYYY-MM-DD') and
...
Explained.

Elapsed: 00:00:00.09

The queries are the same except for the extra date condition which closes off the date range.  Note that the explain plan took 46 seconds with the open-ended range and less than a tenth of a second with the closed off range.

With the patch the slow version is just as fast as the fast one.

This is bug 20061582 in Oracle’s system.

From my conversations with Oracle it sounds like in certain cases the optimizer is reviewing information for many or all of the sub-partitions and since we have 20,000 for this table in production it can take minutes to parse.  I also messed with the dates on the open-ended condition and found that if I made the date early enough the parse time issue went away.  So, it seems that there is some set of conditions, which I don’t know how to easily reproduce, which lead the optimizer to look at sub-partition information and slows parse time.

This is on 11.2.0.4 on an Exadata system.  I was able to reproduce the long parse times on non-Exadata Linux 11.2.0.4 and 12.1.0.2 systems so it does not occur only on Exadata.

This issue surprised me because I thought that the optimizer would not look at partition or sub-partition statistics in a query that has a range condition which spanned more than one partition.  In the past I have always seen the global or table level stats used in these situations.  But, now I know that there are some cases where the optimizer will dig into the sub-partition stats even though the query covers more than one partition.

Looking at the plan of my slow parsing query before and after the patch I get a clue that the optimizer is looking at sub-partition stats:

Partition range without the patch:

KEY(SQ)|KEY(SQ)

Partition range with the patch:

     1 |   136

Evidently KEY(SQ) relates to some sort of filtering of the sub-partitions which cause the long parse time.  The manuals describe KEY(SQ) in a section titled “Dynamic Pruning with Subqueries” so maybe the problem queries have some sub-query that the optimizer was using to choose which sub-partitions that the query needed.

If you have an interval partitioned table with tens of thousands of sub-partitions and parse times in the minutes with open-ended ranges specified on the partitioning column your system could be hitting this same issue.

– Bobby

Categories: DBA Blogs

MAF 2.1 Alta Mobile UI and Oracle Mobile Suite

Andrejus Baranovski - Mon, 2015-02-09 12:19
This post is about Alta Mobile UI and MAF 2.1. I was using Oracle Work Better mobile application as a reference, along with Alta Mobile UI Design Guidelines. This is based on my previous posts about Oracle Mobile Suite and MAF 2.1 integration, read more here - Oracle Mobile Suite - Web Service Performance Optimisation with Result Caching. You could also use Alta UI 12c with regular ADF Web applications, check more here - Custom ADF Application with New ADF 12c Alta UI.

Sample application - MobileServiceBusApp_v5.zip (contains ADF BC SOAP Web Service, Oracle Mobile Suite SOAP to REST transformation and MAF 2.1 application with Alta Mobile UI) implements a dashboard. Component to visualise amounts and values comes out of the box with MAF (iPad view):


User could open a springboard and select Employees option in the sample application:


Here we can search for employees by first name/last name and bring results as a list:


Optionally user could switch to cards view, to display the same results in different way:


You could click on employee icon - this will load details screen and bring additional info:


Steven King is a manager, he manages Executive department and there are two team members reporting to him. Compensation graph can be displayed for team members:


A list of team members is displayed in the third tab:


There is MAF 2.1 Deck component in Employees search page, it allows to switch between a list and cards view:


Employee details page is rendered with a help of MAF 2.1 Deck component as well, along with Select Button component to display tab UI and navigate between Detail, Compensation and Team sections:


You could check how these pages are implemented in the sample app provided above.

Flat World and CBE: Self-paced does not imply isolation

Michael Feldstein - Mon, 2015-02-09 07:48

By Phil HillMore Posts (287)

As competency-based education (CBE) becomes more and more important to US higher education, it would be worth exploring the learning platforms in use. While there are cases of institutions using their traditional LMS to support a CBE program, there is a new market developing specifically around learning platforms that are designed specifically for self-paced, fully-online, competency-framework based approaches.

Several weeks ago Flat World announced their latest round of funding, raising $5 million of debt financing, raising their total to $40.7 million. The company started out by offering full e-textbooks (and was previously named FlatWorld Knowledge), developing 110 titles that included 25 of the 50 most-used lecture courses. The e-textbook market was not working out, however, and the company pivoted to competency-based education around the time that Chris Etesse became CEO two years ago. Now the company is developing a combined CBE learning platform with integrated course content – much of it repurposing the pre-existing e-textbook materials. Their first academic partner for CBE is Brandman University, a non-traditional part of the Chapman University system and is currently one of the CBEN network.

One central tenet of the Flat World approach is based on their history and pivot – a tight integration of content and platform. As Etesse describes it, content is a 1st-class citizen in their system whereas other loosely-coupled approaches that do not tie content and platform together can be difficult to navigate and collect learning analytics. In other words, this intentionally is a walled-garden approach. For Brandman, approximately 70% of the content comes from the pre-existing FlatWorld texts, 25% comes from various OER sources, and about 5% has been custom-designed for Brandman.

In other words, this is very much a walled garden by design. While there is support for outside content, I believe this integration must be done by Flat World designers.

As was the case for the description of the Helix CBE-based learning platform, my interest here is not merely to review one company’s products, but rather to illustrate aspects of the growing CBE movement using the demo.

CBE programs by their very nature tend to be self-paced. One criticism or line of questions I’m seeing more often deals with the nature of self-paced learning itself. Are students just plugging through mindless e-text and multiple-choice assessments in isolation? What Flat World illustrates – as with other major CBE learning platforms – is that self-paced does not imply isolation, either from a student-teacher or from a student-student perspective. New approaches that are different than simple discussion forums are required, however.

FlatWorld shows several learning activities:

  • Reading text and viewing multi-media content adaptively presented based on a pretest and progress against competencies;
  • Taking formative assessments primarily through multiple-choice quizzes;
  • Interacting with students and with faculty;
  • Working through project-based assignments;
  • Taking summative assessments through proctored, webcam-streaming approach.

The activities and assessments do not have to be students working in isolation and using multiple-choice. For example, the project based work can be included and assignments can include submission of written reports or based on short-form prompts. As can be seen below, the assessments can be based on submitted written work which faculty grade and use for feedback.

FWK_Demo_Submit_Assessment

For communication with others, students are tracked in how active they are in communicating with faculty and even with other students (called ‘social’), as seen below.

FWK_Demo_7_Day_Breakdown

One challenge of a self-paced program such as CBE approaches is figuring out how to encourage students to interact with others. There is not a simple cohort to work with – the interaction instead will often be based on content. Who else is working through the same content in roughly the same time period.

FlatWorld uses an approach that is very similar to Stack Overflow, where students can ask and answer questions over time, and the answers are voted up or down to allow the best answers to rise to the top. The stack overflow is moderated by faculty at Brandman. This not only allows students working on the same competencies at roughly the same time, but it even allows interaction with students on similar competencies separated in time.

FW_DiscussionBoards

 

FW_SocialInteraction

There certainly is a tendency in many CBE programs to stick to multiple-choice assignments and quizzes and to avoid much social interaction. That method is a whole lot easier to design, and with several hundred of new programs under development, I think the overall quality can be quite low in many programs, particularly those looking for a quick-win CBE introduction, essentially trying to jump on the bandwagon. You can see the tendency towards multiple-choice in the FlatWorld system as well.

But self-paced does not imply isolation, and the Flat World implementation of the Brandman University program shows how CBE can support project-based work, written assignments and assessments, and interaction between students and faculty as well as between multiple students.

The post Flat World and CBE: Self-paced does not imply isolation appeared first on e-Literate.

Will Oracle Linux experts need a rundown of Docker?

Chris Foot - Mon, 2015-02-09 01:58

Container technology solution Docker was the talk of the industry last year, as the open source project partnered with IBM, Microsoft and a list of other tech companies in order to reduce server energy usage and drastically simplify software development.

Now, Oracle is hopping on the Docker bandwagon, making its Oracle Linux distribution available in a Docker container on the Docker Hub, according to InformationWeek's Charles Babcock. The goal is to make Oracle Linux more appealing to developers looking to transport apps across different environments. 

Ready for download 
This isn't the first Linux distro to be offered through Docker Hub. Babcock noted that Canonical's Ubuntu had integrated Docker support before Oracle (although Ubuntu is an operating system designed for desktops). In addition, server-based Linux OS distros SUSE, CentOS and Red Hat Enterprise Linux are now available via Docker Hub. 

This poses an interesting question as to how Oracle Linux support groups will integrate Docker's technology into client infrastructures. Think of the impact this development will have on Web-based businesses. Data Center Knowledge noted that more than 33 percent of the globe's websites operate on servers running off of Linux. Therefore, websites running on Oracle Linux may grow leaner, enabling developers to further test the limits of how many features and back-end functions they can add to websites. 

Not the only Oracle product 
Oracle isn't necessarily late to the game as far as Docker integration is concerned. Oracle also owns open source database MySQL, which can be downloaded from Docker Hub. Oracle Senior Vice President of Linux and Virtualization Engineering Wim Coekaertz stated how MySQL's Docker compatibility will impact DBAs who use the database engine. 

"With Oracle Linux and MySQL images available on the Docker Hub Registry, users will be able to quickly create and publish custom Docker containers that layer applications on top of Oracle Linux and MySQL, which is a great time-save for both independent software vendors and IT departments," said Coekaertz, as quoted by DCK.

Coekaertz expanded on the issue, asserting that although Docker promises to simplify virtual machines, he doesn't believe the open source project will deliver "complete isolation" to those who download the Dockerized version of Oracle Linux. 

As far as MySQL is concerned, he believes a Docker-compatible MySQL engine will enhance DevOps staff by allowing engineers to share the container across different teams. This is a role that will help Oracle experts accommodate companies looking to support hybrid environments. 

The post Will Oracle Linux experts need a rundown of Docker? appeared first on Remote DBA Experts.

Never gather WORKLOAD stats on Exadata...

Yann Neuhaus - Mon, 2015-02-09 01:56

For Exadata, oracle has introduced an 'EXADATA' mode which sets a high transfer rate (with IOTFRSPEED as in NOWORKLOAD statistics) and set a MBRC (as in WORKLOAD statistics). Those values are set rather than gathered because all the SmartScan optimization done at storage cell level, which makes the multiblock reads less expensive, is difficult to measure from the database.
Here I will explain what I stated in a previous blog: direct-path reads are not counted as multiblock reads for the MBRC system statistic. And direct-path read should be the main i/o path in Exadata as you probably bought that machine to benefit from SmartScan.

With direct-path reads

On a test database that has no activity, I’m creating a 1000 blocks table. My goal is to gather WORKLOAD system statistics during a simple table full scan on that table, and see how it calculates SREADTIM, MREADTIM and MBRC.

SQL> connect demo/demo
Connected.

SQL> drop table DEMO;
Table dropped.

SQL> create table DEMO pctfree 99 as select rpad('x',1000,'x') n from dual connect by level <=1000;
Table created.
Then I run a simple select between the calls to ‘start’ and ‘stop’ procedures of the dbms_stats WORKLOAD system stats gathering.
SQL> exec dbms_stats.gather_system_stats('start');
PL/SQL procedure successfully completed.

SQL> connect demo/demo
Connected.

SQL> select count(*) from DEMO;

  COUNT(*)
----------
      1000

I check the physical read statistics (this is why have reconnected my session so that I can query v$mystat without doing the delta)
SQL> select name,value from v$mystat join v$statname using(statistic#) where (name like 'phy%' or name like 'cell%') and value>0;

NAME                                                              VALUE
------------------------------------------------------------ ----------
physical read total IO requests                                      22
physical read total multi block requests                              7
physical read total bytes                                       8306688
cell physical IO interconnect bytes                             8306688
physical reads                                                     1000
physical reads direct                                              1000
physical read IO requests                                            15
physical read bytes                                             8192000
cell scans                                                            1
cell blocks processed by cache layer                               1000
cell blocks processed by txn layer                                 1000
cell blocks processed by data layer                                1000
cell physical IO bytes eligible for predicate offload           8192000
cell physical IO interconnect bytes returned by smart scan       130760
cell IO uncompressed bytes                                      8192000
I’ve read 1000 blocks in 15 i/o calls so I'm sure it is multiblock reads. All of them (1000 x 8k) was eligible for SmartScan and those 1000 blocks have been processed by the storage cell.

If you wonder why I have only 7 'physical read total multi block requests' it's because it accounts only the 'full' multiblock reads - not those that are limited by extent boundary. See here for that analysis.

If you wonder why I have only 22 'physical read total IO requests' then I've not the answer. The sql_trace shows only the 15 'direct path read'. And dbms_stats counts only the 'physical read IO requests'. If you have any idea, please comment.

I stop my WORKLOAD statistics gathering:
SQL> exec dbms_stats.gather_system_stats('stop');
PL/SQL procedure successfully completed.
And check the system statistics that have been set:

SQL> select * from sys.aux_stats$;

SNAME           PNAME           PVAL1
--------------- ---------- ----------
SYSSTATS_INFO   STATUS
SYSSTATS_INFO   DSTART
SYSSTATS_INFO   DSTOP
SYSSTATS_INFO   FLAGS               1
SYSSTATS_MAIN   CPUSPEEDNW       2300
SYSSTATS_MAIN   IOSEEKTIM          10
SYSSTATS_MAIN   IOTFRSPEED       4096
SYSSTATS_MAIN   SREADTIM
SYSSTATS_MAIN   MREADTIM         .151
SYSSTATS_MAIN   CPUSPEED         2300
SYSSTATS_MAIN   MBRC
SYSSTATS_MAIN   MAXTHR
SYSSTATS_MAIN   SLAVETHR
I have no SREADTIM which is expected as I've done only multiblock reads. I have a MREADTIM. But I don't have the MBRC set.

With conventional (aka buffered) reads

Let's do the same after disabling serial direct-path reads:

SQL> alter session set "_serial_direct_read"=never;
Session altered.
I do the same as before, but now my session stats show only conventional reads:
NAME                                                              VALUE
------------------------------------------------------------ ----------
physical read total IO requests                                      44
physical read total multi block requests                             28
physical read total bytes                                       8192000
cell physical IO interconnect bytes                             8192000
physical reads                                                     1000
physical reads cache                                               1000
physical read IO requests                                            44
physical read bytes                                             8192000
physical reads cache prefetch                                       956

and here are the gathered stats:
SNAME           PNAME           PVAL1
--------------- ---------- ----------
SYSSTATS_INFO   STATUS
SYSSTATS_INFO   DSTART
SYSSTATS_INFO   DSTOP
SYSSTATS_INFO   FLAGS               1
SYSSTATS_MAIN   CPUSPEEDNW       2300
SYSSTATS_MAIN   IOSEEKTIM          10
SYSSTATS_MAIN   IOTFRSPEED       4096
SYSSTATS_MAIN   SREADTIM
SYSSTATS_MAIN   MREADTIM         .028
SYSSTATS_MAIN   CPUSPEED         2300
SYSSTATS_MAIN   MBRC               23
SYSSTATS_MAIN   MAXTHR
SYSSTATS_MAIN   SLAVETHR

Now the MBRC is set with the gathered value.

This proves that MBRC is set only for conventional multiblock reads. Direct-path reads are not accounted.

Conclusion If you are on Exadata, you probably want to benefit from SmartScan. Then you probably want the CBO to choose FULL TABLE SCAN which will do direct-path reads for large tables (according that they don't have a lot of updated buffers in SGA). If you gather WORKLOAD statistics they will set MBRC without accounting for those direct-path reads and it will probably be set lower than the average actual multiblock read (which - in direct-path reads - is close the the db_file_multiblock_read - default or set value).
This is the reason why Oracle introduced the EXADATA mode: it sets the MBRC from the db_file_multiblock_read value. They also set the IOTFRSPEED to a high value because gathering MREADTIM will probably get a very low value - lower than SREADTIM - thanks to the SmartScan. And CBO ignores values where MREADTIM is less than SREADTIM.

An alternative to EXADATA mode can be setting those values as NOWORKLOAD statistics and keep the db_file_multiblock_read_count set. You will have the same behavior because CBO uses db_file_multiblock_read_count when it is set and there are no MBRC system stats. But the danger is that if someone resets the db_file_multiblock_read_count (and I often advise to keep defaults) then the CBO will use a value of 8 and that will probably increase the cost of full table scans too much.

All formulas are here with a script that shows what is used by the CBO.

Never say never Well, that blog post title is too extreme because:

@FranckPachot @kevinclosson @fritshoogland @moustafa_dba @JLOracle never say never ;-)

— Christian Antognini (@ChrisAntognini) February 8, 2015 So I should say:
Never gather WORKLOAD stats on Exadata... except if your workload is not an Exadata optimized one.
If you are using Exadata for OLTP, then yes, you can gather WORKLOAD statistics as they probably fit OLTP behaviour. But in any case, always check the gathered stats and see if they are relevant.

select * from table where rownum=1

XTended Oracle SQL - Sun, 2015-02-08 17:40

I never thought I would have to optimize so simple query as

select col1, col2, col4, col7 from table where rownum=1

(even though I read recently “SELECT * FROM TABLE” Runs Out Of TEMP Space)
But a few days ago frequent executions of this query caused big problems on the one of our databases(11.2.0.3) because of adaptive serial direct path reads.

I don’t know why, but I felt intuitively that full table scan with “First K rows” optimization (“_optimizer_rownum_pred_based_fkr“=true) should turn off adaptive serial direct path reads. It seems quite logical to me.

PS. Unfortunately I had a little time, so I didn’t investigate what process and why it was doing that, I just created profile with “index full scan” access, and it completely solved the problem.

Categories: Development

OBIEE Monitoring and Diagnostics with InfluxDB and Grafana

Rittman Mead Consulting - Sun, 2015-02-08 15:21

In this article I’m going to look at collecting time-series metrics into the InfluxDB database and visualising them in snazzy Grafana dashboards. The datasets I’m going to use are OS metrics (CPU, Disk, etc) and the DMS metrics from OBIEE, both of which are collected using the support for a Carbon/Graphite listener in InfluxDB.

The Dynamic Monitoring System (DMS) in OBIEE is one of the best ways of being able to peer into the internals of the product and find out quite what’s going on. Whether performing diagnostics on a specific issue or just generally monitoring to make sure things are ticking over nicely, using the DMS metrics you can level-up your OBIEE sysadmin skills beyond what you’d get with Fusion Middleware Control out of the box. In fact, the DMS metrics are what you can get access to with Cloud Control 12c (EM12c) – but for that you need EM12c and the BI Management Pack. In this article we’re going to see how to easily set up our DMS dashboard.
N.B. if you’ve read my previous articles, what I write here (use InfluxDB/Grafana) supersedes what I wrote in those (use Graphite) as my recommended approach to working with arbitrary time-series metrics.

Overview

To get the DMS data out of OBIEE we’re going to use the obi-metrics-agent tool that Rittman Mead open-sourced last year. This connects to OPMN and pulls the data out. We’ll store the data in InfluxDB, and then visualise it in Grafana. Whilst not mandatory for the DMS stats, we’ll also setup collectl so that we can show OS stats alongside the DMS ones.

InfluxDB

InfluxDB is a database, but unlike a RDBMS such as Oracle – good for generally everything – it is a what’s called a Time-Series Database (TSDB). This category of database focuses on storing data for a series, holding a given value for a point in time. Generally they’re optimised for handling large quantities of inbound metrics (think Internet of Things), rather than necessarily excelling at handling changes to the data (update/delete) – but that’s fine here since metric events in the past don’t generally change.

I’m using InfluxDB here for a few reasons:

  1. Grafana supports it as a source, with lots of active development for its specific features.
  2. It’s not Graphite. Whilst I have spent many a happy hour using Graphite I’ve spent many a frustrating day and night trying to install the damn thing – every time I want to use it on a new installation. It’s fundamentally long in the tooth, a whilst good for its time is now legacy in my mind. Graphite is also several things – a data store (whisper), a web application (graphite web), and a data collector (carbon). Since we’re using Grafana, the web front end that Graphite provides is redundant, and is where a lot of the installation problems come from.
  3. KISS! Yes I could store time series data in Oracle/mySQL/DB2/yadayada, but InfluxDB does one thing (storing time series metrics) and one thing only, very well and very easily with almost no setup.

For an eloquent discussion of Time-Series Databases read these couple of excellent articles by Baron Schwarz here and here.

Grafana

On the front-end we have Grafana which is a web application that is rapidly becoming accepted as one of the best time-series metric visualisation tools available. It is a fork of Kibana, and can work with data held in a variety of sources including Graphite and InfluxDB. To run Grafana you need to have a web server in place – I’m using Apache just because it’s familiar, but Grafana probably works with whatever your favourite is too.

OS

This article is based around the OBIEE SampleApp v406 VM, but should work without modification on any OL/CentOS/RHEL 6 environment.

InfluxDB and Grafana run on both RHEL and Debian based Linux distros, as well as Mac OS. The specific setup steps detailed here might need some changes according on the OS.

Getting Started with InfluxDB InfluxDB Installation and Configuration as a Graphite/Carbon Endpoint

InfluxDB is a doddle to install. Simply download the rpm, unzip it, and run. BOOM. Compared to Graphite, this makes it a massive winner already.

wget http://s3.amazonaws.com/influxdb/influxdb-latest-1.x86_64.rpm
sudo rpm -ivh influxdb-latest-1.x86_64.rpm

This downloads and installs InfluxDB into /opt/influxdb and configures it as a service that will start at boot time.

Before we go ahead an start it, let’s configure it to work with existing applications that are sending data to Graphite using the Carbon protocol. InfluxDB can support this and enables you to literally switch Graphite out in favour of InfluxDB with no changes required on the source.

Edit the configuration file that you’ll find at /opt/influxdb/shared/config.toml and locate the line that reads:

[input_plugins.graphite]

In v0.8.8 this is at line 41. In the following stanza set the plugin to enabled, specify the listener port, and give the name of the database that you want to store data in, so that it looks like this.

# Configure the graphite api
[input_plugins.graphite]
enabled = true
# address = "0.0.0.0" # If not set, is actually set to bind-address.
port = 2003
database = "carbon"  # store graphite data in this database
# udp_enabled = true # enable udp interface on the same port as the tcp interface

Note that the file is owned by a user created at installation time, influxdb, so you’ll need to use sudo to edit the file.

Now start up InfluxDB:

sudo service influxdb start

You should see it start up successfully:

[oracle@demo influxdb]$ sudo service influxdb start
Setting ulimit -n 65536
Starting the process influxdb [ OK ]
influxdb process was started [ OK ]

You can see the InfluxDB log file and confirm that the Graphite/Carbon listener has started:

[oracle@demo shared]$ tail -f /opt/influxdb/shared/log.txt
[2015/02/02 20:24:04 GMT] [INFO] (github.com/influxdb/influxdb/cluster.func·005:1187) Recovered local server
[2015/02/02 20:24:04 GMT] [INFO] (github.com/influxdb/influxdb/server.(*Server).ListenAndServe:133) recovered
[2015/02/02 20:24:04 GMT] [INFO] (github.com/influxdb/influxdb/coordinator.(*Coordinator).ConnectToProtobufServers:898) Connecting to other nodes in the cluster
[2015/02/02 20:24:04 GMT] [INFO] (github.com/influxdb/influxdb/server.(*Server).ListenAndServe:139) Starting admin interface on port 8083
[2015/02/02 20:24:04 GMT] [INFO] (github.com/influxdb/influxdb/server.(*Server).ListenAndServe:152) Starting Graphite Listener on 0.0.0.0:2003
[2015/02/02 20:24:04 GMT] [INFO] (github.com/influxdb/influxdb/server.(*Server).ListenAndServe:178) Collectd input plugins is disabled
[2015/02/02 20:24:04 GMT] [INFO] (github.com/influxdb/influxdb/server.(*Server).ListenAndServe:187) UDP server is disabled
[2015/02/02 20:24:04 GMT] [INFO] (github.com/influxdb/influxdb/server.(*Server).ListenAndServe:187) UDP server is disabled
[2015/02/02 20:24:04 GMT] [INFO] (github.com/influxdb/influxdb/server.(*Server).ListenAndServe:216) Starting Http Api server on port 8086
[2015/02/02 20:24:04 GMT] [INFO] (github.com/influxdb/influxdb/server.(*Server).reportStats:254) Reporting stats: &client.Series{Name:"reports", Columns:[]string{"os", "arch", "id", "version"}, Points:[][]interface {}{[]interface {}{"linux", "amd64", "e7d3d5cf69a4faf2", "0.8.8"}}}

At this point if you’re using the stock SampleApp v406 image, or if indeed any machine with a firewall configured, you need to open up ports 8083 and 8086 for InfluxDB. Edit /etc/sysconfig/iptables (using sudo) and add:

-A INPUT -m state --state NEW -m tcp -p tcp --dport 8083 -j ACCEPT
-A INPUT -m state --state NEW -m tcp -p tcp --dport 8086 -j ACCEPT

immediately after the existing ACCEPT rules. Restart iptables to pick up the change:

sudo service iptables restart

If you now go to http://localhost:8083/ (replace localhost with the hostname of the server on which you’ve installed InfluxDB), you’ll get the InfluxDB web interface. It’s fairly rudimentary, but suffices just fine:

Login as root/root, and you’ll see a list of nothing much, since we’ve not got any databases yet. You can create a database from here, but for repeatability and a general preference for using the command line here is how to create a database called carbon with the HTTP API called from curl (assuming you’re running it locally; change localhost if not):

curl -X POST 'http://localhost:8086/db?u=root&p=root' -d '{"name": "carbon"}'

Simple huh? Now hit refresh on the web UI and after logging back in again you’ll see the new database:

You can call the database anything you want, just make sure what you create in InfluxDB matches what you put in the configuration file for the graphite/carbon listener.

Now we’ll create a second database that we’ll need later on to hold the internal dashboard definitions from Grafana:

curl -X POST 'http://localhost:8086/db?u=root&p=root' -d '{"name": "grafana"}'

You should now have two InfluxDB databases, primed and ready for data:

Validating the InfluxDB Carbon Listener

To make sure that InfluxDB is accepting data on the carbon listener use the NetCat (nc) utility to send some dummy data to it:

echo "example.foo.bar 3 `date +%s`"|nc localhost 2003

Now go to the InfluxDB web interface and click Explore Data ». In the query field enter

list series

To see the first five rows of data itself use the query

select * from /.*/ limit 5

InfluxDB Queries

You’ll notice that what we’re doing here (“SELECT … FROM …”) looks pretty SQL-like. Indeed, InfluxDB support a SQL-like query language, which if you’re coming from an RDBMS background is nicely comforting ;-)

The syntax is documented, but what I would point out is the apparently odd /.*/ constructor for the “table” is in fact a regular expression (regex) to match the series for which to return values. We could have written select * from example.foo.bar but the .* wildcard enclosed in the / / regex delimiters is a quick way to check all the series we’ve got.

Going off on a bit of a tangent (but hey, why not), let’s write a quick Python script to stick some randomised data into InfluxDB. Paste the following into a terminal window to create the script and make it executable:

cat >~/test_carbon.py<<EOF
#!/usr/bin/env python
import socket
import time
import random
import sys

CARBON_SERVER = sys.argv[1]
CARBON_PORT = int(sys.argv[2])

while True:
        message = 'test.data.foo.bar %d %d\n' % (random.randint(1,20),int(time.time()))
        print 'sending message:\n%s' % message
        sock = socket.socket()
        sock.connect((CARBON_SERVER, CARBON_PORT))
        sock.sendall(message)
        time.sleep(1)
        sock.close()
EOF
chmod u+x ~/test_carbon.py

And run it: (hit Ctrl-C when you’ve had enough)

$ ~/test_carbon.py localhost 2003
sending message:
test.data.foo.bar 3 1422910401

sending message:
test.data.foo.bar 5 1422910402
[...]

Now we’ve got two series in InfluxDB:

  • example.foo.bar – that we sent using nc
  • test.data.foo.bar – using the python script

Let’s go back to the InfluxDB web UI and have a look at the new data, using the literal series name in the query:

select * from test.data.foo.bar

Well fancy that – InfluxDB has done us a nice little graph of the data. But more to the point, we can see all the values in the series.

And a regex shows us both series, matching on the ‘foo’ part of the name:

select * from /foo/ limit 3

Let’s take it a step further. InfluxDB supports aggregate functions, such as max, min, and so on:

select count(value), max(value),mean(value),min(value) from test.data.foo.bar

Whilst we’re at it, let’s bring in another way to get data out – with the HTTP API, just like we used for creating the database above. Given a query, it returns the data in json format. There’s a nice little utility called jq which we can use to pretty-print the json, so let’s install that first:

sudo yum install -y jq

and then call the InfluxDB API, piping the return into jq:

curl --silent --get 'http://localhost:8086/db/carbon/series?u=root&p=root' --data-urlencode "q=select count(value), max(value),mean(value),min(value) from test.data.foo.bar"|jq '.'

The result should look something like this:

[
  {
    "name": "test.data.foo.bar",
    "columns": [
      "time",
      "count",
      "max",
      "mean",
      "min"
    ],
    "points": [
      [
        0,
        12,
        14,
        5.666666666666665,
        1
      ]
    ]
  }
]

We could have used the Web UI for this, but to be honest the inclusion of the graphs just confuses things because there’s nothing to graph and the table of data that we want gets hidden lower down the page.

Setting up obi-metrics-agent to Send OBIEE DMS metrics to InfluxDB

obi-metrics-agent is an open-source tool from Rittman Mead that polls your OBIEE system to pull out all the lovely juicy DMS metrics from it. It can write them to file, insert them to an RDBMS, or as we’re using it here, send them to a carbon-compatible endpoint (such as Graphite, or in our case, InfluxDB).

To install it simply clone the git repository (I’m doing it to /opt but you can put it where you want)

# Install pre-requisite
sudo yum install -y libxml2-devel python-devel libxslt-devel python-pip
sudo pip install lxml
# Clone the git repository
git clone https://github.com/RittmanMead/obi-metrics-agent.git ~/obi-metrics-agent
# Move it to /opt folder
sudo mv ~/obi-metrics-agent /opt

and then run it:

cd /opt/obi-metrics-agent

./obi-metrics-agent.py \
--opmnbin /app/oracle/biee/instances/instance1/bin/opmnctl \
--output carbon \
--carbon-server localhost

I’ve used line continuation character \ here to make the statement clearer. Make sure you update opmnbin for the correct path of your OPMN binary as necessary, and localhost if your InfluxDB server is not local to where you are running obi-metrics-agent.

After running this you should be able to see the metrics in InfluxDB. For example:

select * from /Oracle_BI_DB_Connection_Pool\..+\.*Busy/ limit 5

Setting up collectl to Send OS metrics to InfluxDB

collectl is an excellent tool written by Mark Seger and reports on all sorts of OS-level metrics. It can run interactively, write metrics to file, and/or send them on to a carbon endpoint such as InfluxDB.

Installation is a piece of cake, using the EPEL yum repository:

# Install the EPEL yum repository
sudo rpm -Uvh http://dl.fedoraproject.org/pub/epel/6/`uname -p`/epel-release-6-8.noarch.rpm
# Install collectl
sudo yum install -y collectl
# Set it to start at boot
sudo chkconfig --level 35 collectl on

Configuration to enable logging to InfluxDB is a simple matter of modifying the /etc/collectl.conf configuration file either by hand or using this set of sed statements to do it automagically.
The localhost in the second sed command is the hostname of the server on which InfluxDB is running:

sudo sed -i.bak -e 's/^DaemonCommands/#DaemonCommands/g' /etc/collectl.conf
sudo sed -i -e '/^#DaemonCommands/a DaemonCommands = -f \/var\/log\/collectl -P -m -scdmnCDZ --export graphite,localhost:2003,p=.os,s=cdmnCDZ' /etc/collectl.conf

If you want to log more frequently than ten seconds, make this change (for 5 second intervals here):

sudo sed -i -e '/#Interval =     10/a Interval = 5' /etc/collectl.conf

Restart collectl for the changes to take effect:

sudo service collectl restart

As above, a quick check through the web UI should confirm we’re getting data through into InfluxDB:

Note the very handy regex lets us be lazy with the series naming. We know there is a metric called in part ‘cputotal’, so using /cputotal/ can match anything with it in.

Installing and Configuring Grafana

Like InfluxDB, Grafana is also easy to install, although it does require a bit of setting up. It needs to be hooked into a web server, as well as configured to connect to a source for metrics and storing dashboard definitions.

First, download the binary (this is based on v1.9.1, but releases are frequent so check the downloads page for the latest):

cd ~
wget http://grafanarel.s3.amazonaws.com/grafana-1.9.1.zip

Unzip it and move it to /opt:

unzip grafana-1.9.1.zip
sudo mv grafana-1.9.1 /opt

Configuring Grafana to Connect to InfluxDB

We need to do a bit of configuration, so first create the configuration file based on the template given:

cd /opt/grafana-1.9.1
cp config.sample.js config.js

And now open the config.js file in your favourite text editor. Grafana supports various sources for metrics data, as well as various targets to which it can save the dashboard definitions. The configuration file helpfully comes with configuration elements for many of these, but all commented out. Uncomment the InfluxDB stanzas and amend them as follows:

datasources: {
    influxdb: {
        type: 'influxdb',
        url: "http://sampleapp:8086/db/carbon",
        username: 'root',
        password: 'root',
    },
    grafana: {
        type: 'influxdb',
        url: "http://sampleapp:8086/db/grafana",
        username: 'root',
        password: 'root',
        grafanaDB: true
    },
},

Points to note:

  1. The servername is the server host as you will be accessing it from your web browser. So whilst the configuration we did earlier was all based around ‘localhost’, since it was just communication within components on the same server, the Grafana configuration is what the web application from your web browser uses. So unless you are using a web browser on the same machine as where InfluxDB is running, you must put in the server address of your InfluxDB machine here.
  2. The default InfluxDB username/password is root/root, not admin/admin
  3. Edit the database names in the url, either as shown if you’ve followed the same names used earlier in the article or your own versions of them if not.
Setting Grafana up in Apache

Grafana runs within a web server, such as Apache or nginx. Here I’m using Apache, so first off install it:

sudo yum install -y httpd

And then set up an entry for Grafana in the configuration folder by pasting the following to the command line:

cat > /tmp/grafana.conf <<EOF
Alias /grafana /opt/grafana-1.9.1

<Location /grafana>
Order deny,allow
Allow from 127.0.0.1
Allow from ::1
Allow from all
</Location>
EOF

sudo mv /tmp/grafana.conf /etc/httpd/conf.d/grafana.conf

Now restart Apache:

sudo service httpd restart

And if the gods of bits and bytes are smiling on you, when you go to http://yourserver/grafana you should see:

Note that as with InfluxDB, you may well need to open your firewall for Apache which is on port 80 by default. Follow the same iptables instructions as above to do this.

Building Grafana Dashboards on Metrics Held in InfluxDB

So now we’ve set up our metric collectors, sending data into InfluxDB.

Let’s see now how to produce some swanky dashboards in Grafana.

Grafana has a concept of Dashboards, which are made up of Rows and within those Panels. A Panel can have on it a metric Graphs (duh), but also static text or single figure metrics.

To create a new dashboard click the folder icon and select New:

You get a fairly minimal blank dashboards. On the left you’ll notice a little green tab: hover over that and it pops out to form a menu box, from where you can choose the option to add a graph panel:

Grafana Graph Basics

On the blank graph that’s created click on the title (with the accurate text “click here”) and select edit from the options that appear. This takes you to the graph editing page, which looks equally blank but from here we can now start adding metrics:

In the box labelled series start typing Active_Sessions and notice that Grafana will autocomplete it to any available metrics matching this:

Select Oracle_BI_PS_Sessions.Active_Sessions and your graph should now display the metric.

To change the time period shown in the graph, use the time picker at the top of the screen.You can also click & drag (“brushing”) on any graph to select a particular slice of time.

So, set the time filter to 15 minutes ago and from the Auto-refresh submenu set it to refresh every 5 seconds. Now login to your OBIEE instance, and you should see the Active Sessions value increase (one per session login):

To add another to the graph you can click on Add query at the bottom right of the page, or if it’s closely related to the one you’ve defined already click on the cog next to it and select duplicate:

In the second query add Oracle_BI_General.Total_sessions (remember, you can just type part of the string and Grafana autocompletes based on the metric series stored in InfluxDB). Run a query in OBIEE to cause sessions to be created on the BI Server, and you should now see the Total sessions increase:

To save the graph, and the dashboard, click the Save icon. To return to the dashboard to see how your graph looks alongside others, or to add a new dashboards, click on Back to dashboard.

Grafana Graph Formatting

Let’s now take a look at the options we’ve got for modifying the styling of the graph. There are several tabs/sections to the graph editor – General, Metrics (the default), Axes & Grid, and Display Styles. The first obvious thing to change is the graph title, which can be changed on the General tab:

From here you can also change how the graph is sized on the dashboard using the Span and Height options. A new feature in recent versions of Grafana is the ability to link dashboards to help with analysis paths – guided navigation as we’d call it in OBIEE – and it’s from the General tab here that you can define this.

On the Metrics tab you can specify what text to use in the legend. By default you get the full series name, which is usually too big to be useful as well as containing a lot of redundant repeating text. You can either specify literal text in the alias field, or you can use segments of the series name identified by $x where x is the zero-based segment number. In the example I’ve hardcoded the literal value for the second metric query, and used a dynamic segment name for the first:

On the Axes & Grid tab you can specify the obvious stuff like min/max scales for the axes and the scale to use (bits, bytes, etc). To put metrics on the right axis (and to change the colour of the metric line too) click on the legend line, and from there select the axis/colour as required:

You can set thresholds to overlay on the graph (to highlight warning/critical values, for example), as well as customise the legend to show an aggregate value for each metric, show it in a table, or not at all:

The last tab, Display Styles, has even more goodies. One of my favourite new additions to Grafana is the Tooltip. Enabling this gives you a tooltip when you hover over the graph, displaying the value of all the series at that point in time:

You can change the presentation of the graph, which by default is a line, adding bars and/or points, as well as changing the line width and fill.

  • Solid Fill:

  • Bars only

  • Points and translucent fill:

Advanced InfluxDB Query Building in Grafana Identifying Metric Series with RegEx

In the example above there were two fairly specific metrics that we wanted to report against. What you will find is much more common is wanting to graph out a set of metrics from the same ‘family’. For example, OBIEE DMS metrics include a great deal of information about each Connection Pool that’s defined. They’re all in a hierarchy that look like this:

obi11-01.OBI.Oracle_BI_DB_Connection_Pool.Star_01_-_Sample_App_Data_ORCL_Sample_Relational_Connection

Under which you’ve got

Capacity
Current Connection Count
Current Queued Requests

and so on.

So rather than creating an individual metric query for each of these (similar to how we did for the two session metrics previously) we’ll use InfluxDB’s rather smart regex method for identifying metric series in a query. And because Grafana is awesome, writing the regex isn’t as painful as it could be because the autocomplete validates your expression in realtime. Let’s get started.

First up, let’s work out the root of the metric series that we want. In this case, it’s the orcl connection pool. So in the series box, enter /orcl/. The / delimiters indicate that it is a regex query. As soon as you enter the second / you’ll get the autocomplete showing you the matching series:

/orcl/

If you scroll down the list you’ll notice there’s other metrics in there beside Connection Pool ones, so let’s refine our query a bit

/orcl_Connection_Pool/

That’s better, but we’ve now got all the Connection Pool metrics, which whilst are fascinating to study (no, really) complicate our view of the data a bit, so let’s pick out just the ones we want. First up we’ll put in the dot that’s going to precede any of the final identifiers for the series (.Capacity, .Current Connection Count, etc). A dot is a special character in regex so we need to escape it \.

/orcl_Connection_Pool\./

And now let’s check we’re on the right lines by specifying just Capacity to match:

/orcl_Connection_Pool\.Capacity/

Excellent. So we can now add in more permutations, with a bracketed list of options separated with the pipe (regex OR) character:

/orcl_Connection_Pool\.(Capacity|Current)/

We can use a wildcard .* for expressions that are not directly after the dot that we specified in the match pattern. For example, let’s add any metric that includes Queued:

/orcl_Connection_Pool\.(Capacity|Current|.*Queued)/

But now we’ve a rather long list of matches, so let’s refine the regex to narrow it down:

/orcl_Connection_Pool\.(Capacity|Current|Peak.*Queued).+(Requests|Connection)/

(Something else I tried before this was regex negative look-behind, but it looks like Go (which InfluxDB is written in) doesn’t support it).

Setting the Alias to $4, and the legend to include values in a table format gives us this:

Now to be honest here, in this specific example, I could have created four separate metric queries in a fraction of the time it took to construct that regex. That doesn’t detract from the usefulness and power of regex though, it simply illustrates the point of using the right tool for the right job, and where there’s a few easily identified and static metrics, a manual selection may be quicker.

Aggregates

By default Grafana will request the mean of a series at the defined grain of time from InfluxDB. The grain of time is calculated automatically based on the time window you’ve got shown in your graph. If you’re collecting data every five seconds, and build a graph to show a week’s worth of data, showing all 120960 data points will end up in a very indistinct line:

So instead Grafana generates an InfluxDB query that rolls the data up to more sensible intervals – in the case of a week’s worth of data, every 10 minutes:

You can see, and override, the time grouping in the metric panel. By default it’s dynamic and you can see the current value in use in lighter text, like this:

You can also set an optional minimal time grouping in the second of the “group by time” box (beneath the first). This is a time grouping under which Grafana will never go, so if you always want to roll up to, say, at least a minute (but higher if the duration of the graph requires it), you’d set that here.

So I’ve said that InfluxDB can roll up the figures – but how does it roll up multiple values into one? By default, it takes the mean of all the values. Depending on what you’re looking at, this can be less that desirable, because you may miss important spikes and troughs in your data. So you can change the aggregate rule, to look at the maximum value, minimum, and so on. Do this by clicking on the aggregation in the metric panel:

This is the same series of data, but shown as 5 second samples rolled up to a minute, using the mean, max, and min aggregate rules:

For a look at how all three series can be better rendered together see the discussion of Series Specific Overrides later in this article.

You can also use aggregate functions with measures that may not be simple point in time values. For example, an incrementing/accumulating measure (such as a counter like “number of requests since launch”) you actually want to graph the rate of change, the delta between each point. To do this, use the derivative function. In this graph you can see the default aggregation (mean, in green) against derivative, in yellow. One is in effect the “actual” value of the measure, the other is the rate of change, which is much more useful to see in a time series.

Note that if you are using derivative you may need to fix the group by time to the grain at which you are storing data. In my example I am storing data every 5 seconds, but if the default time grain on the graph is 1s then it won’t show the derivative data.

See more details about the aggregations available in InfluxDB in the docs here. If you want to use an aggregation (or any query) that isn’t supported in the Grafana interface simply click on the cog icon and select Raw query mode from where you can customise the query to your heart’s content.

Drawing inverse graphs

As mentioned just above, you can customise the query sent to InfluxDB, which means you can do this neat trick to render multiple related series that would otherwise overlap by inverting one of them. In this example I’ve got the network I/O drawn conventionally:

But since metrics like network I/O, disk I/O and so on have a concept of adding and taking, it feels much more natural to see the input as ‘positive’ and output as ‘negative’.

Which certainly for my money is easier to see at a glance whether we’ve got data coming or going, and at what volume. To implement this simply set up your series as usual, and then for the series you want to invert click on the cog icon and select Raw query mode. Then in place of

mean(value)

put

mean(value*-1)

Series Specific Overrides

The presentation options that you specify for a graph will by default apply to all series shown in the graph. As we saw previously you can change the colour, width, fill etc of a line, or render the graph as bars and/or points instead. This is all good stuff, but presumes that all measures are created equal – that every piece of data on the graph has the same meaning and importance. Often we’ll want to change how display a particular set of data, and we can use Series Specific Overrides in Grafana to do that.

For example in this graph we can see the number of busy connections and the available capacity:

But the actual (Busy Connections) is the piece of data we want to see at a glance, against the context of the available Capacity. So by setting up a Series Specific Override we can change the formatting of each line individually – calling out the actual (thick green) and making the threshold more muted (purple):

To configure a Series Specific Override got to the Display Styles panel and click Add series override rule. Pick the specific series or use a regex to identify it, and then use the + button to add formatting options:

A very useful formatting option is Z-index, which enables you to define the layering on the graph so that a given series is rendered on top (or below) another. To bring something to the very front use a Z-index of 3; for the very back use -3. Series Specific Overrides are also a good way of dynamically assigning multiple Y axes.

Another great use of Series Specific Overrides is to show the min/max range for data as a shaded area behind the main line, thus providing more context for aggregate data. I discussed above how Grafana can get InfluxDB to roll up (aggregate) values across time periods to make graphs more readable when shown for long time frames – and how this can mask data exceptions. If you only show the mean, you miss small spikes and troughs; if you only show the max or min then you over or under count the actual impact of the measure. But, we can have the best of all worlds! The next two graphs show the starting point – showing just the mean (missing the subtleties of a data series) and showing all three versions of a measure (ugly and unusable):

Instead of this, let’s bring out the mean, but still show it in context of the range of the values within the aggregate:

I hope you’d agree that this a much cleaner and clearer way of presenting the data. To do it we need two steps:

  1. Make sure that each metric has an alias. This is used in the label but importantly is also used in the next step to identify each data series. You can skip this bit if you really want and regex the series to match directly in the next step, but setting an alias is much easier

  2. On the Display Styles tab click Add series override rule at the bottom of the page. In the alias or regex box you should see your aliases listed. Select the one which is the maximum series. Then choose the formatting option Fill below to and select the minimum series

    You’ll notice that Grafana automagically adds in a second rule to disable lines for the minimum series, as well as on the existing maximum series rule.

    Optionally, add another rule for your mean series, setting the Z-index to 3 to bring it right to the front.

    All pretty simple really, and a nice result:

Variables in Grafana (a.k.a. Templating)

In lots of metric series there is often going to be groups of measures that are associated with reoccurring instances of a parent. For example, CPU details for multiple servers, or in the OBIEE world connection pool details for multiple connection pools.

centos-base.os.cputotals.user
db12c-01.os.cputotals.user
gitserver.os.cputotals.user
media02.os.cputotals.user
monitoring-01.os.cputotals.user
etc

Instead of creating a graph for each permutation, or modifying the graph each time you want to see a different instance, you can instead use Templating, which is basically creating a variable that can be incorporated into query definitions.

To create a template you first need to enable it per dashboard, using the cog icon in the top-right of the dashboard:

Then open the Templating option from the menu opened by clicking on the cog on the left side of the screen

Now set up the name of the variable, and specify a full (not partial, as you would in the graph panel) InfluxDB query that will return all the values for the variable – or rather, the list of all series from which you’re going to take the variable name.

Let’s have a look at an example. Within the OBIEE DMS metrics you have details about the thread pools within the BI Server, and there are different thread pool types, and it is that type that I want to store. Here’s a snippet of the series:

[...]
obi11-01.OBI.Oracle_BI_Thread_Pool.DB_Gateway.Peak_Queued_Requests
obi11-01.OBI.Oracle_BI_Thread_Pool.DB_Gateway.Peak_Queued_Time_milliseconds
obi11-01.OBI.Oracle_BI_Thread_Pool.DB_Gateway.Peak_Thread_Count
obi11-01.OBI.Oracle_BI_Thread_Pool.Server.Accumulated_Requests
obi11-01.OBI.Oracle_BI_Thread_Pool.Server.Average_Execution_Time_milliseconds
obi11-01.OBI.Oracle_BI_Thread_Pool.Server.Average_Queued_Requests
obi11-01.OBI.Oracle_BI_Thread_Pool.Server.Average_Queued_Time_milliseconds
obi11-01.OBI.Oracle_BI_Thread_Pool.Server.Avg_Request_per_sec
[...]

Looking down the list, it’s the DB_Gateway and Server values that I want to extract. First up is some regex to return the series with the thread pool name in:

/.*Oracle_BI_Thread_Pool.*/

and now build it as part of an InfluxDB query:

list series /.*Oracle_BI_Thread_Pool.*/

You can validate this against InfluxDB directly using the web UI for InfluxDB or curl as described much earlier in this article. Put the query into the Grafana Template definition and hit the green play button. You’ll get a list back of all series returned by the query:

Now we want to extract out the threadpool names and we do this using the regex capture group ( ):

/.*Oracle_BI_Thread_Pool\.(.*)\./

Hit play again and the results from the first query are parsed through the regex and you should have just the values you need:

If the values are likely to change (for example, Connection Pool names will change in OBIEE depending on the RPD) then make sure you select Refresh on load. Click Add and you’re done.

You can also define variables with fixed values, which is good if they’re never going to change, or they are but you’ve not got your head around RegEx. Simply change the Type to Custom and enter comma-separated values.

To use the variable simply reference it prefix with a dollar sign, in the metric definition:

or in the title:

To change the value selected just use the dropdown from the top of the screen:

Annotations

Another very nice feature of Grafana is Annotations. These are overlays on each graph at a given point in time to provide additional context to the data. How I use it is when analysing test data to be able to see what script I ran when:

There’s two elements to Annotations – setting them up in Grafana, and getting the data into the backend (InfluxDB in this case, but they work with other data sources such as Graphite too).

Storing an Annotation

An annotation is nothing more than some time series data, but typically a string at a given point in time rather than a continually changing value (measure) over time.

To store it just just chuck the data at InfluxDB and it creates the necessary series. In this example I’m using one called events but it could be called foobar for all it matters. You can read more about putting data into InfluxDB here and choose one most suitable to the event that it is you want to record to display as an annotation. I’m running some bash-based testing, so curl fits well here, but if you were using a python program you could use the python InfluxDB client, and so on.

Sending data with curl is easy, and looks like this:

curl -X POST -d '[{"name":"events","columns":["id","action"],"points":[["big load test","start"]]}]' 'http://monitoring-server.foo.com:8086/db/carbon/series?u=root&p=root'

The main bit of interest, other than the obvious server name and credentials, is the JSON payload that we’re sending. Pulling it out and formatting it a bit more nicely:

{  
   "name":"events",
   "columns":[  
      "test-id",
      "action"
   ],
   "points":[  
      [  
         "big load test",
         "start"
      ]
   ]
}

So the series (“table”) we’re loading is called events, and we’re going to store an entry for this point in time with two columns, test-id and action storing values big load test and start respectively. Interestingly (and something that’s very powerful) is that InfluxDB’s schema can evolve in a way that no traditional RDBMS could. Never mind that we’re not had to define events before loading it, we could even load it at subsequent time points with more columns if we want to simply by sending them in the data payload.

Coming back to real-world usage, we want to make the load as dynamic as possible, so with a few variables and a bit of bash magic we have something like this that will automatically load to InfluxDB the start and end time of every load test that gets run, along with the name of the script that ran it and the host on which it ran:

INFLUXDB_HOST=monitoring-server.foo.com
INFLUXDB_PORT=8086
INFLUXDB_USER=root
INFLUXDB_PW=root
HOSTNAME=$(hostname)
SCRIPT=`basename $0`
curl -X POST -d '[{"name":"events","columns":["host","id","action"],"points":[["'"$HOSTNAME"'","'"$SCRIPT"'","start"]]}]' "http://$INFLUXDB_HOST:$INFLUXDB_PORT/db/carbon/series?u=$INFLUXDB_USER&p=$INFLUXDB_PW"

echo 'Load testing bash code goes here. For now let us just go to sleep'
sleep 60

curl -X POST -d '[{"name":"events","columns":["host","id","action"],"points":[["'"$HOSTNAME"'","'"$SCRIPT"'","end"]]}]' "http://$INFLUXDB_HOST:$INFLUXDB_PORT/db/carbon/series?u=$INFLUXDB_USER&p=$INFLUXDB_PW"

Displaying annotations in Grafana

Once we’ve got a series (“table”) in InfluxDB with our events in, pulling them through into Grafana is pretty simple. Let’s first check the data we’ve got, by going to the InfluxDB web UI (http://influxdb:8083) and from the Explore Data » page running a query against the series we’ve loaded:

select * from events

The time value is in epoch milliseconds, and the remaining values are whatever you sent to it.

Now in Grafana enable Annotations for the dashboard (via the cog in the top-right corner)

Once enabled use the cog in the top-left corner to open the menu from which you select the Annotations dialog. Click on the Add tab. Give the event group a name, and then the InfluxDB query that pulls back the relevant data. All you need to do is take the above query that you used to test out the data and append the necessary time predicate where $timeFilter so that only events for the time window currently being shown are returned:

select * from events where $timeFilter

Click Add and then set your time window to include a period when an event was recorded. You should see a nice clear vertical line and a marker on the x-axis that when you hover over it gives you some more information:

You can use the Column Mapping options in the Annotations window to bring in additional information into the tooltip. For example, in my event series I have the id of the test, action (start/end), and the hostname. I can get this overlaid onto the tooltip by mapping the columns thus:

Which then looks like this on the graph tooltips:

N.B. currently (Grafana v1.9.1) when making changes to an Annotation definition you need to refresh the graph views after clicking Update on the annotation definition, otherwise you won’t see the change reflected in the annotations on the graphs.

Sparklines

Everything I’ve written about Grafana so far has revolved around the graphs that it creates, and unsurprisingly because this is the core feature of the tool, the bread and butter. But there are other visualisation options available – “Singlestat”, and “Text”. The latter is pretty obvious and I’m not going to discuss it here, but Singlestat, a.k.a. Sparkline and/or Performance Tiles, is awesome and well worth a look. First, an illustration of what I’m blathering about:

A nice headline figure of the current number of active sessions, along with a sparkline to show the trend of the metric.

To add one of these to your dashboard go to the green row menu icon on the left (it’s mostly hidden and will pop out when you hover over it) and select Add Panel -> singlestat.

On the panel that appears go to the edit screen as shown :

In the Metrics panel specify the series as you would with a graph, but remember you need to pull back just a single series – no point writing a regex to match multiple ones. Here I’m going to show the number of queued requests on a connection pool. Note that because I want to show the latest value I change the aggregation to last:

In the General tab set a title for the panel, as well as the width of it – unlike graphs you typically want these panels to be fairly narrow since the point is to show a figure not lots of detail. You’ll notice that I’ve also defined a Drilldown / detail link so that a user can click on the summary figure and go to another dashboard to see more detail.

The Options tab gives you the option to set font size, prefixes/suffixes, and is also where you set up sparkline and conditional formatting.

Tick the Spark line box to draw a sparkline within the panel – if you’re not seen them before sparklines are great visualisations for showing the trend of a metric without fussing with axes and specific values. Tick the Background mode to use the entire height of the panel for the graph and overlay the summary figure on top.

Now for the bit I think is particularly nice – conditional formatting of the singlestat panel. It’s dead easy and not a new concept but is really great way to let a user see at a real glance if there’s something that needs their attention. In the case of this example here, queueing connections, any queueing is dodgy and more than a few is bad (m’kay). So let’s colour code it:

You can even substitute values for words – maybe the difference between 61 queued sessions and 65 is fairly irrelevant, it’s the fact that there are that magnitude of queued sessions that is more the problem:

Note that the values are absolutes, not ranges. There is an open issue for this so hopefully that will change. The effect is nice though:

Conclusion

Hopefully this article has given you a good idea of what is possible with data stored in InfluxDB and visualised in Grafana, and how to go about doing it.

If you’re interested in OBIEE monitoring you might also be interested in the ELK suite of tools that complements what I have described here well, giving an overall setup like this:

You can read more about its use with OBIEE here, or indeed get in touch with us if you’d like to learn more or have us come and help with your OBIEE monitoring and diagnostics.

Categories: BI & Warehousing