Feed aggregator

Pre-digested authentication

Gary Myers - Sun, 2014-03-09 04:03
A bit of a follow-up to my previous post on Digest authentication.

The fun thing about doing the hard yards to code up the algorithm is that you get a deeper level of understanding about what's going on. Take these lines:

    v_in_str := utl_raw.cast_to_raw(i_username||':'||i_realm||':'||i_password);
    v_ha1 := lower(DBMS_OBFUSCATION_TOOLKIT.md5(input => v_in_raw));

Every time we build the "who we are" component for this site, we start with exactly the same hash made up of the username, realm (site) and password. This is a batch routine, which means somewhere we would store the username and password for the site - whether that is a parameter in a scheduling tool, coded into a shell script or OS file, or somewhere in the database. If you've got the security option for Oracle, you can use the Wallet, with its own security layers.

But digest authentication gives us another option. Since we actually use the hashed value of the user/site/password, we can store that instead. The receiving site has no idea the code doesn't actually know the REAL password.

Now turn that over in your head. We can call the web service as this user WITHOUT knowing the password, just by knowing the hash. I don't know about you, but it makes me a little bit more worried when I hear of user details being leaked or hacked from sites. It's all very well reassuring us the passwords are hashed and can't be reverse engineered (assuming your own password can't be brute-forced). But depending on the security mechanism, a leak of those hashes can be dangerous. If a hacked provider advises people to change their passwords, take their advice. 

'Basic' authentication doesn't have the same weakness. In that environment the provider can store the password hash after applying their own 'secret sauce' mechanism (mostly a salt). When you authenticate, you send the password, they apply the secret sauce and compare the result. You can't get away without knowing the password, because all the work is done at their end.

There's no secret sauce for digest authentication, and there can't be. Even if the provider had the password in the clear, there's no way they can be sure the client has the password since all the client needs is the result of the hash. The provider must store, or be able to work out, the result of that hash because they need to replicate the final hash result using both the client and server nonces. They can store either that same user/realm/password hash as is, or they can encrypt it in a reversible manner, but a one-way hash wouldn't be usable.

In short, digest authentication means that our batch routine doesn't need to 'know' the actual password, just a hash. But it also makes those hashes a lot more dangerous.

I'm an amateur in this field. I checked around and it does seem this is a recognized limitation of digest authentication. EG: This Q&A and this comparison of Digest and Basic.

PL/SQL, UTL_HTTP and Digest Authentication

Gary Myers - Fri, 2014-03-07 17:28
For the first time in what seems like ages, I've actually put together a piece of code worth sharing. It's not that I haven't been working, but just that it has all been very 'in-house' specific.

However I had a recent requirement to use a web service that makes use of Digest Authentication. If you have look at the UTL_HTTP SET_AUTHENTICATION subprogram, it only addresses Basic authentication (and, apparently, Amazon S3 which looks intriguing).

In Basic authentication, the username and password get sent across as part of the request. Going through SSL, that doesn't seem too bad, as it is encrypted over the transfer and the certificates should ensure you are talking to the legitimate destination. However if that destination has been compromised, you've handed over your username and password. In an ideal world, the server shouldn't need to know your password, which is why database should only have hashed versions of passwords. 

Outside of SSL, you might as well just print the username and password on the back of a postcard.


In Digest authentication, you get a more complex interaction that keeps the password secret. You ask for a page, the server responds with an "Authentication Required" plus some bits of information including a nonce. You come up with a hashed value based on the server nonce, your own nonce and a hash of your username and password and send it back with the next request. The server has its own record of your username/password hash and can duplicate the calculations. If everyone is happy, the server can fulfill your request and nobody ever actually needs to know the password.

Our server used SSL, and thanks to Tim's article on SSL and UTL_HTTP, it was a simple set up. I've done it before, but that was in the days when it seemed a lot hard to get certificates OUT of a browser to put them in your Oracle Wallet.

The Interwebs were a lot less forthcoming on a PL/SQL implementation of Digest authentication though. The closest I got was this discussion, which can be summed up as "This may be complex, but I do not see these offhand as being impossible to do in PL/SQL....No Digest configured web server nearby or I would definitely have had a bash at this"

A read through the Wikipedia article, and I came up with the code below:

Firstly, after the initial request, go through the header to get the 'WWW-Authenticate' item. Take the value associated with that header, and pass it to the "auth_digest" procedure. 


    l_max := UTL_HTTP.GET_HEADER_COUNT(l_http_response);
    l_ind := 1;
    l_name := '-';
    while l_ind <= l_max AND l_name != 'WWW-Authenticate' LOOP
      UTL_HTTP.GET_HEADER(l_http_response, l_ind, l_name, l_value);
      IF  l_name = 'WWW-Authenticate'
      AND l_http_response.status_code = UTL_HTTP.HTTP_UNAUTHORIZED THEN
        --
        -- Unauthorized. Using the Authorization response header, we can come up with the
        -- required values to allow a re-request with the authentication/authorisation details
        --
        dbms_application_info.set_action('auth:'||$$PLSQL_LINE);
        UTL_HTTP.END_RESPONSE(l_http_response);
        --
        dbms_application_info.set_action('auth_req:'||$$PLSQL_LINE);
        l_http_request := UTL_HTTP.BEGIN_REQUEST(l_server||l_method);
        auth_digest (io_http_request => l_http_request, i_auth_value => l_value,
          i_username => nvl(i_username,'xxxx'), i_password => nvl(i_password,'xxxx'), 
          i_req_path => l_method, i_client_nonce => null);
        dbms_output.put_line($$PLSQL_LINE||':Get Response from authenticated request');
        dbms_application_info.set_action('auth_resp:'||$$PLSQL_LINE);
        l_http_response := UTL_HTTP.GET_RESPONSE(l_http_request);
        dump_resp (l_http_response);
        dump_hdr (l_http_response);
      END IF;
      l_ind := l_ind + 1;

    END LOOP;

The auth_digest starts with an extraction of the 'valuables' from that value string. I've used regular expressions here. I spent time working with grep, awk and perl, and regexes are habit forming.

  procedure extract_auth_items
    (i_text in varchar2,
    o_realm out varchar2, o_qop out varchar2, o_nonce out varchar2, o_opaque out varchar2) is
  begin
    o_realm   := substr(regexp_substr(i_text, 'realm="[^"]+' ),8);
    o_qop     := substr(regexp_substr(i_text, 'qop="[^"]+'   ),6);
    o_nonce   := substr(regexp_substr(i_text, 'nonce="[^"]+' ),8);
    o_opaque  := substr(regexp_substr(i_text, 'opaque="[^"]+'),9);

  end extract_auth_items;

Next is the 'meat' where the values are combined in the various hashes. Yes, there's a hard-coded default client nonce in there that, by a strange coincidence, matches on in the wikipedia article. That's how this stuff gets developed, by following through a worked example. Just like school.

  function digest_auth_md5_calcs
      (i_username     in varchar2, i_password     in varchar2, i_req_path      in varchar2,
      i_realm         in varchar2, i_server_nonce in varchar2,
      i_qop           in varchar2 default 'auth',
      i_client_nonce  in varchar2 default '0a4f113b',
      i_req_type      in varchar2 default 'GET',  i_req_cnt IN NUMBER default 1)
  return varchar2 is
    --
    v_in_str    varchar2(2000);
    v_in_raw    raw(2000);
    v_out       varchar2(60);
    --
    v_ha1       varchar2(40);
    v_ha2       varchar2(40);
    v_response  varchar2(40);
    --
  begin
    --
    v_in_str := i_username||':'||i_realm||':'||i_password;
    v_in_raw := utl_raw.cast_to_raw(v_in_str);
    v_out := DBMS_OBFUSCATION_TOOLKIT.md5(input => v_in_raw);
    v_ha1 := lower(v_out);
    --
    v_in_str := i_req_type||':'||i_req_path;
    v_in_raw := utl_raw.cast_to_raw(v_in_str);
    v_out := DBMS_OBFUSCATION_TOOLKIT.md5(input => v_in_raw);
    v_ha2 := lower(v_out);
    --
    v_in_str := v_ha1||':'||i_server_nonce||':'||lpad(i_req_cnt,8,0)||':'||
                   i_client_nonce||':'||i_qop||':'||v_ha2;
    v_in_raw := utl_raw.cast_to_raw(v_in_str);
    v_out := DBMS_OBFUSCATION_TOOLKIT.md5(input => v_in_raw);
    v_response := lower(v_out);
    --
    return v_response;
  end digest_auth_md5_calcs;

And this is the full auth_digest bit

  procedure auth_digest
    (io_http_request  in out UTL_HTTP.REQ,  i_auth_value    in varchar2,
    i_username        in varchar2,          i_password      in varchar2,
    i_req_path        in varchar2,          i_qop           in varchar2 default 'auth',
    i_req_cnt         in number default 1,  i_client_nonce  in varchar2 default null)
  is
    l_realm         varchar2(400);
    l_qop           varchar2(30);
    l_server_nonce  VARCHAR2(400);
    l_opaque        varchar2(100);
    --
    l_response      varchar2(40);
    l_value         VARCHAR2(1024);
    --
    l_client_nonce  varchar2(30);
    --
  begin
    --
    -- Apply the username / password for Digest authentication
    --
    extract_auth_items (i_auth_value,
                    l_realm, l_qop, l_server_nonce, l_opaque);
    --
    IF i_client_nonce is not null then
      l_client_nonce := i_client_nonce;
    ELSE
      l_client_nonce := lower(utl_raw.cast_to_raw(DBMS_OBFUSCATION_TOOLKIT.md5(
                            input_string=>dbms_random.value)));
    END IF;
    --
    l_response := digest_auth_md5_calcs
      (i_username => i_username, i_password    => i_password,     i_req_path => i_req_path,
      i_realm     => l_realm,    i_server_nonce => l_server_nonce,
      i_client_nonce => l_client_nonce);
    --i_qop default to auth, i_req_type default to GET and i_req_cnt default to 1
    --
    l_value := 'Digest username="' ||i_username          ||'",'||
               ' realm="'          ||l_realm             ||'",'||
               ' nonce="'          ||l_server_nonce      ||'",'||
               ' uri="'            ||i_req_path          ||'",'||
               ' response="'       ||l_response          ||'",'||
               ' qop='             ||i_qop               ||',' ||
               ' nc='              ||lpad(i_req_cnt,8,0) ||',' ||
               ' cnonce="'         ||i_client_nonce      ||'"'
               ;
    --
    IF l_opaque is not null then
      l_value := l_value||',opaque="'||l_opaque||'"';
    END IF;
    dbms_output.put_line(l_value);
    UTL_HTTP.SET_HEADER(io_http_request, 'Authorization', l_value);
    --

  end auth_digest;

A package with the code is available from my CodeSpace page, or directly here. There's a lot of debug 'stuff' in there. The code I'm using is still tailored to my single specific need, and I've stripped specific values from this published variant. You'll need to hard-code or parameterize it for any real use. I may be able to do a 'cleaned-up' version in the future, but don't hold your breath.

BI change is coming, time to get over it and get on with the job

Steve Jones - Fri, 2014-03-07 11:15
One of the things that always stuns me in IT is how people don't appear to like change.  Whether it was the EAI folks pushing back on Web Services in 2000 in favour of their old-school approaches.  The package guys pushing back against SaaS or now the BI guys pushing back against the new wave of BI technologies and approaches the message is always the same: We are happy doing what we are doing,
Categories: Fusion Middleware

Finally...the official sizing guide for Oracle Application Express

Joel Kallman - Thu, 2014-03-06 06:29
The following question was recently posted on an internal mailing list:
"Is there a sizing/capacity/scalability guide available for APEX?"
I'm always fascinated by this question.  I appreciate the fact that this is a standard, acceptable practice in the industry, and people come to expect it.  How else could architects and planners appropriately allocate resources without some form of estimate?  This impacts capital expenditures and budgets and rack space and energy costs and support costs and human capital.  People seem to be looking for some simple formula like:
(X number of pages in an APEX application) * (Y number of concurrent users) = (W number of processors) + (Z number of GB of RAM)
Voila!  Plug that formula into your favorite spreadsheet and away you go.  Well....if I lured you in with the title of this blog post, I have to be honest - it's all fiction.  There is no such thing.  But why not?  There are a number of reasons.

  1. There is no such thing as a representative, typical application.  As I've often bloviated in the past, Oracle Application Express is as fast or as slow as you, the developer, make it.  The overhead associated with the APEX engine itself is fairly static (measured in hundredths of a second). If you have a query that takes 30 seconds to execute and you put this query in a report in an APEX application, you can expect the execution of that page to take just over 30 seconds per page view.

  2. What does "concurrent" mean?  Is that the total number of users in an hour?  Total number of users in a 5-minute interval?  Or is that the high-water mark of number of users all clicking the mouse or hitting the Enter key, all at the same time?

  3. What is the typical "think time" of an end user?  Effectively, resources are only being consumed when there is a request actively being processed by the APEX engine.  So while the end user is interpreting the results of a report or keying in data in a form, they aren't (typically) making any requests to the APEX engine.

  4. How much memory will be consumed by the typical page view?  Does your application allocate GB's of in-memory LOBs, per user per page view?  This would have a definite impact on scalability.
The total number of pages in an application has close to zero correlation to scalability and throughput.  You can have a 1,000-page application, each page with sub-second performance, which will be far more scalable than a 1-page application that consumes 15 seconds per page view.

As the Oracle Database Performance and Tuning Guide states, there are many variables involved in workload estimation, and it's typically done via either benchmarking or extrapolation from a similar system.  But what is "a similar system" for an APEX application?  Does a call-center application at one enterprise approximate the back-office order processing system at another company?

I can understand how a formula can be prepared for a COTS application.  If you're deploying Fusion Applications or the eBusiness Suite or JD Edwards or SAP, those applications are created, the business logic is written, the queries and transactions are crafted, and concurrency has been measured on representative systems for a given workload.  But I don't understand how someone can produce a sizing guide for any application development framework - Application Express, ADF, .NET, Java.  It's like asking "how scalable is C?"

An application that our team wrote and runs for Oracle is quite scalable (the oft-mentioned Aria People employee directory).  Yesterday (05-MAR), there were 2.1M page views on this system with a median page rendering time of 0.03 seconds from 45,314 distinct users.  The busiest hour saw 129,284 page views through the APEX engine (35.9 page views/second).  If another team within Oracle wrote this same system but didn't tune the SQL like we did, is that a reflection on the scalability of APEX?  And if the answer to that question is "no", then is the hardware configuration all that relevant?

Back in 2007, my manager Mike Hichwa took a draft note that I wrote and published an article for  Oracle Magazine entitled "Sizing up Performance".  There is a very simple formula which can be used to estimate the throughput of an APEX application.  This isn't going to help you determine how much hardware to buy or how to estimate the size of your VM, but it will help estimate (in back-of-the-napkin form) how scalable an existing APEX application will be on an existing system.

With all this said, we, on the Oracle Application Express team, have been deficient.  At a minimum, we should have a list of systems developed by our customers, with specific information about the hardware configuration, purpose of the system, and number of end-users served.  Maybe we should also obtain the level of expertise of the developers.  We will gather this information and publish it online (without specific customer names).  If nothing else, this can serve as the foundation for extrapolation by architects and designers.


Unintended, but interesting consequences

Nuno Souto - Thu, 2014-03-06 02:56
It's interesting how from time to time something happens that makes sense and seems logical afterwards, but at the time it causes a bit of a surprise.  Part of the fun of working with this type of software! A few days ago we had an incident in an Oracle DW database when a developer tried to load an infinitely big file from a very large source.  Yeah, you got it: big-data-ish!  Suffice to say: Noonshttp://www.blogger.com/profile/04285930853937157148noreply@blogger.com6

Internal Links

Tim Dexter - Wed, 2014-03-05 20:06

Another great question today, this time, from friend and colleague, Jerry the master house re-fitter. I think we are competing on who can completely rip and replace their entire house in the shortest time on their own. Every conversation we have starts with 'so what are you working on?' He's in the midst of a kitchen re-fit, Im finishing off odds and ends before I re-build our stair well and start work on my hidden man cave under said stairs. Anyhoo, his question!

Can you create a PDF document that shows a summary on the first page and provides links to more detailed sections further down in the document?

Why yes you can Jerry. Something like this? Click on the department names in the first table and the return to top links in the detail sections. Pretty neat huh? Dynamic internal links based on the data, in this case the department names.

Its not that hard to do either. Here's the template, RTF only right now.


The important fields in this case are the ones in red, heres their contents.

TopLink

<fo:block id="doctop" />

Just think of it as an anchor to the top of the page called doctop

Back to Top

<fo:basic-link internal-destination="doctop" text-decoration="underline">Back to Top</fo:basic-link>

Just a live link 'Back to Top' if you will, that takes the user to the doc top location i.e. to the top of the page.

DeptLink

<fo:block id="{DEPARTMENT_NAME}"/>

Just like the TopLink above, this just creates an anchor in the document. The neat thing here is that we dynamically name it the actual value of the DEPARTMENT_NAME. Note that this link is inside the for-each:G_DEPT loop so the {DEPARTMENT_NAME} is evaluated each time the loop iterates. The curly braces force the engine to fetch the DEPARTMENT_NAME value before creating the anchor.

DEPARTMENT_NAME

<fo:basic-link  internal-destination="{DEPARTMENT_NAME}" ><?DEPARTMENT_NAME?></fo:basic-link>

This is the link for the user to be able to navigate to the detail for that department. It does not use a regular MSWord URL, we have to create a field in the template to hold the department name value and apply the link. Note, no text decoration this time i.e. no underline.

You can add a dynamic link on to anything in the summary section. You just need to remember to keep link 'names' as unique as needed for source and destination. You can combine multiple data values into the link name using the concat function.

Template and data available here. Tested with 10 and 11g, will work with all BIP flavors.

Categories: BI & Warehousing

Oracle Direct NFS and Infiniband: A Less-Than-Perfect Match

Don Seiler - Wed, 2014-03-05 19:54
Readers of an earlier post on this blog will know about my latest forays into the world of Direct NFS. Part of that means stumbling over configuration hiccups or slamming into brick walls when you find new bugs.

To quickly re-set the table, my organization purchased the Oracle ZFS Storage Appliance (ZFSSA) 7420. Oracle sold us on the Infiniband connectivity as a way to make a possible future transition to Exadata easier. However the pre-sales POC testing was done over 10gb Ethernet (10gigE). So it was that everything (including their Infiniband switches and cables) arrived at the datacenter and was installed and connected by the Oracle technicians. There were a few initial hiccups and frustrating inconsistencies with their installation and configuration, but those are outside the scope of this post.

We decided to put a copy of our standby database on the ZFSSA and have it run as a second standby. The performance problems were quick to appear, and they weren't pretty.



Configuring SharesWe configured the ZFS project shares by the common Oracle best practices in terms ZFS recordsize and write bias. For example, datafile shares were set to an 8k recordsize (to match the db_block_size) and throughput write bias, where as redo log shares were set to 128k recordsize and latency bias. Note that with Oracle Database 12c, Direct NFS over NFSv4, and the more recent ZFSSA firmware, you gain the benefit of Oracle Intelligent Storage Protocol (OISP), which will determine the recordsize and write bias automatically based on the type of file it recognizes.

Copying the DatabaseTo start out we needed to get a copy of the database onto the ZFSSA shares. This was easily done with RMAN's backup as copy database command, specifying the ZFSSA mount as the format destination. We were fairly impressed with the Direct NFS transfer speed during the copy and so we were optimistic about how it would stand up with our production load.

Starting Recovery!
Once everything was set, we started managed recovery on the standby. Our earlier excitement gave way to a sort of soul-crushing disappointment as the recovery performance basically ground to a standstill and traffic to the ZFSSA went from hundreds of Mbps to barely a trickle. We could stop recovery and copy a big file with great speed, but something in managed recovery was not playing nicely.

We found that we could disable Direct NFS (requires a database restart and software relinking), and managed recovery would actually perform better over the kernel NFS, although still not nearly as well as we would need.

This started a blizzard of SR creations, including SRs being spawned from other SRs. We had SRs open for the ZFSSA team, the Direct NFS team, the Data Guard team, and even the Oracle Linux and Solaris teams, even though we were not on Oracle Linux or Solaris (we use RHEL). It came to a point where I had to tell our account manager to have support stop creating new SRs, since every new SR meant I had to explain the situation to a new technician all over again.

At this point we were having twice-daily conference calls with our account manager and technical leads from the various departments. Their minions were working hard on their end to replicate the problem and find a solution, but we were running into a 4th week of this craziness.

The Infiniband BanditAfter many frustrating weeks of changing configurations, cables, cards, and just generally grasping at straws, it was finally narrowed down to the Infiniband. Or rather, a bug in the open fabric (OFA) linux kernel module that dealt with Infiniband that was triggered when Direct NFS would fire off a whole lot of connections, like when DataGuard managed recover would fire up 80 parallel slaves. We tested out the 10gigE channel we had for the management UI and performance was like night and day with just the one channel.

Oracle Support suggested it might be related to bug 15824316, which also deals with dramatic performance loss with Direct NFS over Infiniband. The bug in the OFA kernel module was fixed in recent versions of Oracle Enterprise Linux (OEL) (specifically the UEK kernel), but Oracle is not sharing this fix with Red Hat (or anyone else, presumably). Since we're on RHEL, we had little choice but to send all the Infiniband networking hardware back and order up some 10gigE replacements.

We're still in the process of getting the 10gigE switches and cables all in place for the final production setup. If you're curious, it's 4 10gigE cards per server, bonded to a single IP to a 10gigE switch into the ZFSSA heads. This 10gigE network is dedicated exclusively to ZFSSA traffic.

So, in the end, if you're on (a recent version of) of OEL/UEK, you should have nothing to worry about. But if you're on RHEL and planning to use Direct NFS, you're going to want to use 10gigE and NOT Infiniband.

Update - 6 Mar 2014Some of have asked, and I want to re-iterate: Oracle have claimed that the OFA module was entirely re-written, and their fix is specific to OEL and is not covered by GPL or any similar license. We were told that they have no plans to share their code with RHEL. Also there is no MOS bug number for the OFA issue, it was apparently re-written from scratch with no bug to track the issue. If this all sounds rather dubious to you, join the club. But it's what our account manager told us at the end of last year.

Another Update - 6 Mar 2014Bjoern Rost and I discussed this privately and after quite a bit of research and analysis he shared this conclusion:

Oracle Support suggested that this issue would not occur with the OFA module used in Oracle Linux with the UEK kernel. RedHat changed their support in RHEL6 from shipping the whole openfabrics stack to just including the drivers that were also present in the upstream mainline kernel. This is RedHat’s policy to ensure stability in the version of the kernel they ship. Oracle offers an OFA package with some additional patches (all GPL/BSD license) for the UEKr1 and UEKr2 kernels. Unfortunately, these two different approaches make it very hard to pinpoint specific patches or create backports for the RedHat kernel version.
Categories: DBA Blogs

The OLAP Extension is now available in SQL Developer 4.0

Keith Laker - Tue, 2014-03-04 14:57


The OLAP Extension is now in SQL Developer 4.0.

See
http://www.oracle.com/technetwork/developer-tools/sql-developer/downloads/sqldev-releasenotes-v4-1925251.html for the details.

The OLAP functionality is mentioned toward the bottom of the web page.
You will still need AWM 12.1.0.1.0 to
  • Manage and enable cube and dimension MV's.
  • Manage data security.
  • Create and edit nested measure folders (i.e. measure folders that are children of other measure folders)
  • Create and edit Maintenance Scripts
  • Manage multilingual support for OLAP Metadata objects
  • Use the OBIEE plugin or the Data Validation plugin
What is new or improved:
  • New Calculation Expression editor for calculated measures.  This allows the user to nest different types to calculated measures easily.  For instance a user can now create a Moving Total of a Prior Period as one calculated measure.  In AWM, it would have required a user to create a Prior Period first and then create a Moving Total calculated measure which referred to the Prior Period measure.  Also the new Calculation Expression editor displays hypertext helper templates when the user selects the OLAP API syntax in the editor.
  • Support for OLAP DML command execution in the SQL Worksheet.  Simply prefix OLAP DML commands by a '~' and then select the execute button to execute them on the SQL Worksheet.  The output of the command will appear in the DBMS Output Window if it is opened, or the Script Output Window if the user has executed 'set serveroutput on' before executing the DML command.
  • Improved OLAP DML Program Editor integrated within the SQL Developer framework.
  • New diagnostic reports in the SQL Developer Report navigator.
  • Ability to create a fact view with a measure dimension (i.e. "pivot cube").  This functionality is accessible from the SQL Developer Tools-OLAP menu option.
  • Cube scripts have been renamed to Build Specifications and are now accessible within the Create/Edit Cube dialog.  The Build Specifications editor there, is similar to the calculation expression editor as far as functionality.
Categories: BI & Warehousing

Latest for the folks who have to deal with Peoplesoft

Nuno Souto - Tue, 2014-03-04 05:51
Dang, been a while since the last posts!  A lot of water under the bridge since then. We've ditched a few people that were not really helping anything, and are now actively looking at cloud solutions, "big data" use, etcetc. Meanwhile, there is the small detail that business as usual has to continue: it's very easy to parrot about the latest gimmick/feature/funtastic technology that will Noonshttp://www.blogger.com/profile/04285930853937157148noreply@blogger.com2

The next big wave of IT is Software Development

Steve Jones - Mon, 2014-03-03 10:25
I can smell a change coming, the last few years have seen cloud and SaaS on the rise and seen a fragmentation in application development (thanks in a large part to the appalling stewardship of Java) and a real focus of budgets around BI and 'vanilla' package approaches.  Now this is a good thing, both because I jumped out of the Java boat onto the BI boat a few years ago but also because its
Categories: Fusion Middleware

Software Development Wave 4: back to the package

Steve Jones - Mon, 2014-03-03 10:20
The end of the next Software Development wave will be when Software development against 'eats itself' as it did with with technologies like Hadoop showing a new value in information, with platforms like SFDC showing new pre-build services, where people like GoodData have turned BI into SaaS.  So we will see the same evolution again and a new generation of commoditisation which drives
Categories: Fusion Middleware

Fun with global temporary tables in Oracle 12c

Mihajlo Tekic - Sun, 2014-03-02 22:07

Few months ago I wrote a post about 12c session specific statistics for global temporary tables (link). Long awaited feature no matter what.

Recently I had some discussions on the same subject with members of my team.

One interesting observation was the behavior of transaction specific GTTs with session specific statistics enabled. What attracted our interest was the fact that data in global temporary tables is not deleted after DBMS_STATS package is invoked.

Prior to 12c, a call to DBMS_STATS will result with an implicit commit. This would wipe out the content of a transaction specific global temporary table.

I’ll digress here a bit. Yes, I know, who would call DBMS_STATS to collect statistics on a transaction specific GTT knowing the data in the table will be lost. Well, things change a bit in 12c.

In Oracle 12c, no implicit commit is invoked when DBMS_STATS.GATHER_TABLE_STATS is invoked on a transaction specific with session specific statistics enabled thus letting users take advantage of session specific statistics for this type of GTTs.

This behavior is documented in Oracle documentation.

I’ll try to put some more light on this behavior through couple of examples:

For this purpose I’ll start with three tables. T1 and T2 are transaction specific temporary tables. T3 is a regular table. By default, in 12c, session specific statistics are used.



CREATE GLOBAL TEMPORARY TABLE t1 (id NUMBER);

CREATE GLOBAL TEMPORARY TABLE t2 (id NUMBER);

CREATE TABLE t3 (id NUMBER);



Scenario #1 – Insert 5 rows to each of the three tables and observe the state of the data after DBMS_STATS is invoked on a transaction specific GTT.



SQL> INSERT INTO t1 (SELECT rownum FROM dual CONNECT BY rownum<=5);
5 rows created.

SQL> INSERT INTO t2 (SELECT rownum FROM dual CONNECT BY rownum<=5);
5 rows created.

SQL> INSERT INTO t3 (SELECT rownum FROM dual CONNECT BY rownum<=5);
5 rows created.

SQL> exec DBMS_STATS.GATHER_TABLE_STATS(user,'T1');
PL/SQL procedure successfully completed.

SQL> SELECT count(1) FROM t1;
COUNT(1)
----------
5


As you can see the data in T1 is still present. Furthermore if you open another session you can also see that T3 has no rows. This means commit was not invoked when session specific statistics were collected for T1.

Scenario 2# Insert 5 rows in each of the three tables and collect statistics only on the regular table, T3.


SQL> INSERT INTO t1 (SELECT rownum FROM dual CONNECT BY rownum<=5);
5 rows created.

SQL> INSERT INTO t2 (SELECT rownum FROM dual CONNECT BY rownum<=5);
5 rows created.

SQL> INSERT INTO t3 (SELECT rownum FROM dual CONNECT BY rownum<=5);
5 rows created.

SQL> exec DBMS_STATS.GATHER_TABLE_STATS(user,'T3');
PL/SQL procedure successfully completed.

SQL> SELECT count(1) FROM t1;
COUNT(1)
----------
0


As you can see in this scenario implicit commit was invoked which resulted with data in T1 being purged.

Hope this helps … :-)

Cheers!





“How did you learn so much stuff about Oracle?”

Cary Millsap - Fri, 2014-02-28 22:35
In LinkedIn, a new connection asked me a very nice question. He asked, “I know this might sound stupid, but how did you learn so much stuff about Oracle. :)”

Good one. I like the presumption that I know a lot of stuff about Oracle. I suppose that I do, at least about some some aspects of it, although I often feel like I don’t know enough. It occurred to me that answering publicly might also be helpful to anyone trying to figure out how to prepare for a career. Here’s my answer.

I took a job with the young consulting division of Oracle Corporation in September 1989, about two weeks after the very first time I had heard the word “Oracle” used as the name of a company. My background had been mathematics and computer science in school. I had two post-graduate degrees: a Master of Science Computer Science with a focus on language design and compilers, and a Master of Business Administration with a focus in finance.

My first “career job” was as a software engineer, which I started before the MBA. I designed languages and wrote compilers to implement those languages. Yes, people actually pay good money for that, and it’s possibly still the most fun I’ve ever had at work. I wrote software in C, lex, and yacc, and I taught my colleagues how to do it, too. In particular, I spent a lot of time teaching my colleagues how to make their C code faster and more portable (so it would run on more computers than just one on which you wrote it).

Even though I loved my job, I didn’t see a lot of future in it. At least not in Colorado Springs in the late 1980s. So I took a year off to get the MBA at SMU in Dallas. I went for the MBA because I thought I needed to learn more about money and business. It was the most difficult academic year of my life, because I was not particularly connected to or even interested in most of the subject matter. I hated a lot of my classes, which made it difficult to do as well as I had been accustomed. But I kept grinding away, and finished my degree in the year it was supposed to take. Of course I learned many, many things that year that have been vital to my career.

A couple of weeks after I got my MBA, I went to work for Oracle in Dallas, with a salary that was 168% of what it had been as a compiler designer. My job was to visit Oracle customers and help them with their problems.

It took a while for me to get into a good rhythm at Oracle. My boss was sending me to these local customers that were having problems with the Oracle Financial Applications (the “Finapps,” as we usually called them, which would many years later become the E-Business Suite) on version 6.0.26 of the ORACLE database (it was all caps back then). At first, I couldn’t help them near as much as I had wanted to. It was frustrating.

That actually became my rhythm: week after week, I visited these people who were having horrific problems with ORACLE and the Finapps. The database in 1990, although it had some pretty big bugs, was still pretty good. It was the applications that caused most of the problems I saw. There were a lot of problems, both with the software and with how it was sold. My job was to fix the problems. Some of those problems were technical. Many were not.

A lot of the problems were performance; problems of the software running “too slowly.” I found those problems particularly interesting. For those, I had some experience and tools at my disposal. I knew a good bit about operating systems and compilers and profilers and linkers and debuggers and all that, and so learning about Oracle indexes and rollback segments (two good examples, continual sources of customer frustration) wasn’t that scary of a step for me.

I hadn’t learned anything about Oracle or relational databases in school, I learned about how the database worked at Oracle by reading the documentation, beginning with the excellent Oracle® Database Concepts. Oracle sped me along a bit with a couple of the standard DBA courses.

My real learning came from being in the field. The problems my customers had were immediately interesting by virtue of being important. The resources available to me for solving such problems back in the early 1990s were really just books, email, and the telephone. The Internet didn’t exist yet. (Can you imagine?) The Oracle books available back then, for the most part, were absolutely horrible. Just garbage. Just about the only thing they were good for was creating problems that you could bill lots of consulting hours to fix. The only thing that was left was email and the telephone.

The problem with email and telephones, however, is that there has to be someone on the other end. Fortunately, I had that. The people on the other end of my email and phone calls were my saviors and heroes. In my early Oracle years, those saviors and heroes included people like Darryl Presley, Laurel Jamtgaard, Tom Kemp, Charlene Feldkamp, David Ensor, Willis Ranney, Lyn Pratt, Lawrence To, Roderick Mañalac, Greg Doherty, Juan Loaiza, Bill Bridge, Brom Mahbod, Alex Ho, Jonathan Klein, Graham Wood, Mark Farnham (who didn’t even work for Oracle, but who could cheerfully introduce me to anyone I needed), Anjo Kolk, and Mogens Nørgaard. I could never repay these people, and many more, for what they did for me. ...In some cases, at all hours of the night.

So, how did I learn so much stuff about Oracle? It started by immersing myself into a universe where every working day I had to solve somebody’s real Oracle problems. Uncomfortable, but effective. I survived because I was persistent and because I had a great company behind me, filled with spectacularly intelligent people who loved helping each other. Could I have done that on my own, today, with the advent of the Internet and lots and lots of great and reliable books out there to draw upon? I doubt it. I sincerely do. But maybe if I were young again...

I tell my children, there’s only one place where money comes from: other people. Money comes only from other people. So many things in life are that way.

I’m a natural introvert. I naturally withdraw from group interactions whenever I don’t feel like I’m helping other people. Thankfully, my work and my family draw me out into the world. If you put me into a situation where I need to solve a technical problem that I can’t solve by myself, then I’ll seek help from the wonderful friends I’ve made.

I can never pay it back, but I can try to pay it forward.

(Oddly, as I’m writing this, I realize that I don’t take the same healthy approach to solving business problems. Perhaps it’s because I naturally assume that my friends would have fun helping solve a technical problem, but that solving a business problem would not be fun and therefore I would be imposing upon them if I were to ask for help solving one. I need to work on that.)

So, to my new LinkedIn friend, here’s my advice. Here’s what worked for me:
  • Educate yourself. Read, study, experiment. Educate yourself especially well in the fundamentals. So many people don’t. Being fantastic at the fundamentals is a competitive advantage, no matter what you do. If it’s Oracle you’re interested in learning about, that’s software, so learn about software: about operating systems, and C, and linkers, and profilers, and debuggers, .... Read the Oracle Database Concepts guide and all the other free Oracle documentation. Read every book there is by Tom Kyte and Christian Antognini and Jonathan Lewis and Tanel Põder and Kerry Osborne and Karen Morton and James Morle all the other great authors out there today. And read their blogs.
  • Find a way to hook yourself into a network of people that are willing and able to help you. You can do that online these days. You can earn your way into a community by doing things like asking thoughtful questions, treating people respectfully (even the ones who don’t treat you respectfully), and finding ways to teach others what you’ve learned. Write. Write what you know, for other people to use and improve. And for God’s sake, if you don’t know something, don’t act like you do. That just makes everyone think you’re an asshole, which isn’t helpful.
  • Immerse yourself into some real problems. Read Scuttle Your Ships Before Advancing if you don’t understand why. You can solve real problems online these days, too (e.g., StackExchange and even Oracle.com), although I think that it’s better to work on real live problems at real live customer sites. Stick with it. Fix things. Help people.
Help people.

That’s my advice.

Waterfall Charts

Tim Dexter - Fri, 2014-02-28 19:35

Great question came through the ether from Holger on waterfall charts last night.

"I know that Answers supports waterfall charts and BI Publisher does not.
Do you have a different solution approach for waterfall charts with BI Publisher (perhaps stacked bars with white areas)?
Maybe you have already implemented something similar in the past and you can send me an example."

I didnt have one to hand, but I do now. Little known fact, the Publisher chart engine is based on the Oracle Reports chart engine. Therefore, this document came straight to mind. Its awesome for chart tips and tricks. Will you have to get your hands dirty in the chart code? Yep. Will you get the chart you want with a little effort? Yep. Now, I know, I know, in this day and age, you should get waterfalls with no effort but then you'd be bored right?

First things first, for the uninitiated, what is a waterfall chart? From some kind person at Wikipedia, "The waterfall chart is normally used for understanding how an initial value is affected by a series of intermediate positive or negative values. Usually the initial and the final values are represented by whole columns, while the intermediate values are denoted by floating columns. The columns are color-coded for distinguishing between positive and negative values."

We'll get back to that last sentence later, for now lets get the basic chart working.

Checking out the Oracle Report charting doc, search for 'floating' their term for 'waterfall' and it will get you to the section on building a 'floating column chart' or in more modern parlance, a waterfall chart. If you have already got your feet wet in the dark arts world of Publisher chart XML, get on with it and get your waterfall working.

If not, read on.

When I first starting looking at this chart, I decided to ignore the 'negative values' in the definition above. Being a glass half full kind of guy I dont see negatives right :)

Without them its a pretty simple job of rendering a stacked bar chart with 4 series for the colors. One for the starting value, one for the ending value, one for the diffs (steps) and one for the base values. The base values color could be set to white but that obscures any tick lines in the chart. Better to use the transparency option from the Oracle Reports doc.

<Series id="0" borderTransparent="true" transparent="true"/> 

Pretty simple, even the data structure is reasonably easy to get working. But, the negative values was nagging at me and Holger, who I pointed at the Oracle Reports doc had come back and could not get negative values to show correctly. So I took another look. What a pain in the butt!

In the chart above (thats my first BIP waterfall maybe the first ever BIP waterfall.) I have lime green, start and finish bars; red for negative and green for positive values. Look a little closer at the hidden bar values where we transition from red to green, ah man, royal pain in the butt! Not because of anything tough in the chart definition, thats pretty straightforward. I just need the following columns START, BASE, DOWN, UP and FINISH. 

START 200
BASE 0
UP 0
DOWN 0
FINISH 0
START 0
BASE 180
UP 0
DOWN 20
FINISH 0
START 0
BASE 150
UP 0
DOWN 30
FINISH 0
 Bar 1 - Start Value
 Bar 2 - PROD1
 Bar 3 - PROD2

and so on. The start, up, down and finish values are reasonably easy to get. The real trick is calculating that hidden BASE value correctly for that transition from -ve >> + ve and vice versa. Hitting Google, I found the key to that calculation in a great page on building a waterfall chart in Excel from the folks at Contextures.  Excel is great at referencing previous cell values to create complex calculations and I guess I could have fudged this article and used an Excel sheet as my data source. I could even have used an Excel template against my database table to create the data for the chart and fed the resulting Excel output back into the report as the data source for the chart. But, I digress, that would be tres cool thou, gotta look at that.
On that page is the formula to get the hidden base bar values and I adapted that into some sql to get the same result.

Lets assume I have the following data in a table:

PRODUCT_NAME SALES PROD1 -20 PROD2 -30 PROD3 50 PROD4 60

The sales values are versus the same period last year i.e. a delta value.  I have a starting value of 200 total sales, lets assume this is pulled from another table.
I have spent the majority of my time on generating the data, the actual chart definition is pretty straight forward. Getting that BASE value has been most tricksy!

I need to generate the following for each column:

PRODUCT_NAME

STRT

BASE_VAL

DOWN

UP

END_TOTAL

START
200
0
0
0
0
PROD1
0
180
20
0
0
PROD2
0
150 30 0
0
PROD3
0 150 0 50 0 PROD4
0 200
0 60 0 END
0 0 0 0 260

Ignoring the START and END values for a second. Here's the query for the PRODx columns:

 SELECT 2 SORT_KEY 
, PRODUCT_NAME
, STRT
, SALES
, UP
, DOWN
, 0 END_TOTAL
, 200 + (SUM(LAG_UP - DOWN) OVER (ORDER BY PRODUCT_NAME)) AS BASE_VAL
FROM
(SELECT P.PRODUCT_NAME
,  0 AS STRT
, P.SALES
, CASE WHEN P.SALES > 0 THEN P.SALES ELSE 0 END AS UP  
, CASE WHEN P.SALES < 0 THEN ABS(P.SALES) ELSE 0 END AS DOWN
, LAG(CASE WHEN P.SALES > 0 THEN P.SALES ELSE 0 END,1,0) 
      OVER (ORDER BY P.PRODUCT_NAME) AS LAG_UP
FROM PRODUCTS P
)

The inner query is breaking the UP and DOWN values into their own columns based on the SALES value. The LAG function is the cool bit to fetch the UP value in the previous row. That column is the key to getting the BASE values correctly.

The outer query just has a calculation for the BASE_VAL.

200 + (SUM(LAG_UP - DOWN) OVER (ORDER BY PRODUCT_NAME))

The SUM..OVER allows me to iterate over the rows to get the calculation I need ie starting value (200) + the running sum of LAG_UP - DOWN. Remember the LAG_UP value is fetching the value from the previous row.
Is there a neater way to do this? Im most sure there is, I could probably eliminate the inner query with a little effort but for the purposes of this post, its quite handy to be able to break things down.

For the start and end values I used more queries and then just UNIONed the three together. Once note on that union; the sorting. For the chart to work, I need START, PRODx, FINISH, in that order. The easiest way to get that was to add a SORT_KEY value to each query and then sort by it. So my total query for the chart was:

SELECT 1 SORT_KEY
, 'START' PRODUCT_NAME
, 200 STRT
, 0 SALES
, 0 UP
, 0 DOWN
, 0 END_TOTAL
, 0 BASE_VAL
FROM PRODUCTS
UNION
SELECT 2 SORT_KEY 
, PRODUCT_NAME
, STRT
, SALES
, UP
, DOWN
, 0 END_TOTAL
, 200 + (SUM(LAG_UP - DOWN) 
      OVER (ORDER BY PRODUCT_NAME)) AS BASE_VAL
FROM
(SELECT P.PRODUCT_NAME
,  0 AS STRT
, P.SALES
, CASE WHEN P.SALES > 0 THEN P.SALES ELSE 0 END AS UP  
, CASE WHEN P.SALES < 0 THEN ABS(P.SALES) ELSE 0 END AS DOWN
, LAG(CASE WHEN P.SALES > 0 THEN P.SALES ELSE 0 END,1,0) 
       OVER (ORDER BY P.PRODUCT_NAME) AS LAG_UP
FROM PRODUCTS P
)
UNION
SELECT 3 SORT_KEY 
, 'END' PRODUCT_NAME
, 0 STRT
, 0 SALES
, 0 UP
, 0 DOWN
, SUM(SALES) + 200 END_TOTAL
, 0 BASE_VAL
FROM PRODUCTS
GROUP BY 1,2,3,4,6
ORDER BY 1 

A lot of effort for a dinky chart but now its done once, doing it again will be easier. Of course no one will want just a single chart in their report, there will be other data, tables, charts, etc. I think if I was doing this in anger I would just break out this query as a separate item in the data model ie a query just for the chart. It will make life much simpler.
Another option that I considered was to build a sub template in XSL to generate the XML tree to support the chart and assign that to a variable. Im sure it can be done with a little effort, I'll save it for another time.

On the last leg, we have the data; now to build the chart. This is actually the easy bit. Sadly I have found an issue in the online template builder that precludes using the chart builder in those templates. However, RTF templates to the rescue!

Insert a chart and in the dialog set up the data like this (click the image to see it full scale.)

Its just a vertical stacked bar with the BASE_VAL color set to white.You can still see the 'hidden' bars and they are over writing the tick lines but if you are happy with it, leave it as is. You can double click the chart and the dialog box can read it no problem. If however, you want those 'hidden' bars truly hidden then click on the Advanced tab of the chart dialog and replace:

<Series id="1" color="#FFFFFF" />

with

<Series id="1" borderTransparent="true" transparent="true" />

and the bars will become completely transparent. You can do the #D and gradient thang if you want and play with colors and themes. You'll then be done with your waterfall masterpiece!

Alot of work? Not really, more than out of the box for sure but hopefully, I have given you enough to decipher the data needs and how to do it at least with an Oracle db. If you need all my files, including table definition, sample XML, BIP DM, Report and templates, you can get them here.

Categories: BI & Warehousing

New APEX Certification Exam BETA Now Running

David Peake - Fri, 2014-02-28 18:40
How does a prospective employer  or customer know that you are any good at developing with Oracle Application Express?
One of the best credentials, to prove you have APEX chops, is to obtain Oracle Application Express Certification!

The APEX Certification Exam has undergone a large upgrade.
Much of the content being rewritten / improved as part of this effort.
The new exam is currently running a BETA program until 10-May-2014.
Testing centers are available worldwide.



Help us by taking the BETA Exam and help yourself by saving significant cost over sitting for the exam once published.
The "published" exam will be the exact same questions you take in the BETA exam.
If the question ranks poorly during the beta it may be rewritten or removed.



The Danger of Moving Incrementally Updated Datafile Copies

Don Seiler - Fri, 2014-02-28 11:20
When I sat down at my desk yesterday morning I was greeted with some disturbing email alerts notifying me that one of the NFS mounts on my standby database host was full. This was the NFS mount that held an image copy of my database that is updated daily from an incremental backup. The concept and an example can be found in the documentation. With a 25Tb database, waiting to restore from backups is not as appealing as simply switching to the copies and getting back to business.

We quickly saw that the reason that this mount was full was that RMAN had tried to make another set of image copies in the latest backup run rather than take an incremental backup for recovery. It does this when it finds no valid copy of the datafiles to increment, and the logs confirmed this to be the reason:

Starting backup at 2014/02/26 13:30:16
no parent backup or copy of datafile 792 found
no parent backup or copy of datafile 513 found
no parent backup or copy of datafile 490 found
no parent backup or copy of datafile 399 found

... and so on, for every datafile. However I knew that the copies that had been there (and been updated) every day were still there. So what was different?

It was then that I remembered my work from the day before. Doing a bit of re-organization, I renamed the directory where the datafile copies lived. However I made sure to re-catalog them and double-checked to make sure the backup tag was still there, which it was. I also crosschecked the copies to make the old entries as expired and then deleted them from the catalog. This turned out to be the cause of the problem.

When the original datafilecopy entries were removed from the catalog, RMAN didn't want to recognize the new entries as the right copies, even though they were literally the same file, with the same backup tag. And so RMAN printed the message you see above and dutifully began making new image copies until it filled up the mountpoint, which didn't have another spare 25 Tb handy.

Today I was able to duplicate the scenario on a (much smaller) sandbox with various sequences. Every time, once I crosschecked the original copies and deleted them as expired, RMAN would create a new copy on the next run. The sequence was basically this:
  1. Run backup-for-recovery and recover commands. First time will create datafile copies as expected.
  2. Run it again, this time it will create an incremental backup and then apply it to the copies made in the previous step.
  3. Move or rename the directory holding the copies.
  4. CROSSCHECK COPY; & DELETE EXPIRED COPY;
  5. CATALOG START WITH '/path/to/new/location/';
  6. LIST DATAFILECOPY ALL; to verify that the copies are registered under the new location and the TAG is right.
  7. Run backup-for-recovery and recover commands (be sure to update the location). I would expect the same results as step 2, but instead new copies are created.
One thing that was very interesting was that if I just cataloged the new location, but did not crosscheck or delete the old entries (i.e. skipped step 4), then I could run the script and it would take an incremental backup as planned and recover the copies in the new location. But then if I later did the crosscheck and delete, it would not accept those copies and create new copies. And all this time I can "list datafilecopy all;" and see both copies with the same tags. Changing the order of steps 4 and 5 made no difference.

I'd be interesting to know what anyone else thinks about it. Personally it seems like a bug to me, so I've opened an SR. So far Oracle Support have confirmed what I've experienced, although have said there is no bug on file. They suggested I use Doc ID 1578701.1 to make another copy of the datafile with a new tag and use that new tag. However if I wanted to do that I would just create a new database copy and keep using the original tag, which is exactly what I've done.

I will be sure to update this post with anything I find. Until then, I wanted to share this experience for anyone else that might need or want to move their datafile copies if they are part of an incrementally-updated-backup strategy.
Categories: DBA Blogs

Run Scala directly in Oracle database

Kuassi Mensah - Thu, 2014-02-27 06:15
A nice proof of concept for #Scala, #java, #Oracle, and #db12c, afficionados http://bit.ly/1o8gejG

Clustering Events

Antony Reynolds - Wed, 2014-02-26 13:25
Setting up an Oracle Event Processing Cluster

Recently I was working with Oracle Event Processing (OEP) and needed to set it up as part  of a high availability cluster.  OEP uses Coherence for quorum membership in an OEP cluster.  Because the solution used caching it was also necessary to include access to external Coherence nodes.  Input messages need to be duplicated across multiple OEP streams and so a JMS Topic adapter needed to be configured.  Finally only one copy of each output event was desired, requiring the use of an HA adapter.  In this blog post I will go through the steps required to implement a true HA OEP cluster.

OEP High Availability Review

The diagram below shows a very simple non-HA OEP configuration:

Events are received from a source (JMS in this blog).  The events are processed by an event processing network which makes use of a cache (Coherence in this blog).  Finally any output events are emitted.  The output events could go to any destination but in this blog we will emit them to a JMS queue.

OEP provides high availability by having multiple event processing instances processing the same event stream in an OEP cluster.  One instance acts as the primary and the other instances act as secondary processors.  Usually only the primary will output events as shown in the diagram below (top stream is the primary):

The actual event processing is the same as in the previous non-HA example.  What is different is how input and output events are handled.  Because we want to minimize or avoid duplicate events we have added an HA output adapter to the event processing network.  This adapter acts as a filter, so that only the primary stream will emit events to out queue.  If the processing of events within the network depends on how the time at which events are received then it is necessary to synchronize the event arrival time across the cluster by using an HA input adapter to synchronize the arrival timestamps of events across the cluster.

OEP Cluster Creation

Lets begin by setting up the base OEP cluster.  To do this we create new OEP configurations on each machine in the cluster.  The steps are outlined below.  Note that the same steps are performed on each machine for each server which will run on that machine:

  • Run ${MW_HOME}/ocep_11.1/common/bin/config.sh.
    • MW_HOME is the installation directory, note that multiple Fusion Middleware products may be installed in this directory.
  • When prompted “Create a new OEP domain”.
  • Provide administrator credentials.
    • Make sure you provide the same credentials on all machines in the cluster.
  • Specify a  “Server name” and “Server listen port”.
    • Each OEP server must have a unique name.
    • Different servers can share the same “Server listen port” unless they are running on the same host.
  • Provide keystore credentials.
    • Make sure you provide the same credentials on all machines in the cluster.
  • Configure any required JDBC data source.
  • Provide the “Domain Name” and “Domain location”.
    • All servers must have the same “Domain name”.
    • The “Domain location” may be different on each server, but I would keep it the same to simplify administration.
    • Multiple servers on the same machine can share the “Domain location” because their configuration will be placed in the directory corresponding to their server name.
  • Create domain!
Configuring an OEP Cluster

Now that we have created our servers we need to configure them so that they can find each other.  OEP uses Oracle Coherence to determine cluster membership.  Coherence clusters can use either multicast or unicast to discover already running members of a cluster.  Multicast has the advantage that it is easy to set up and scales better (see http://www.ateam-oracle.com/using-wka-in-large-coherence-clusters-disabling-multicast/) but has a number of challenges, including failure to propagate by default through routers and accidently joining the wrong cluster because someone else chose the same multicast settings.  We will show how to use both unicast and multicast to discover the cluster. 

Multicast DiscoveryUnicast DiscoveryCoherence multicast uses a class D multicast address that is shared by all servers in the cluster.  On startup a Coherence node broadcasts a message to the multicast address looking for an existing cluster.  If no-one responds then the node will start the cluster.Coherence unicast uses Well Known Addresses (WKAs). Each server in the cluster needs a dedicated listen address/port combination. A subset of these addresses are configured as WKAs and shared between all members of the cluster. As long as at least one of the WKAs is up and running then servers can join the cluster. If a server does not find any cluster members then it checks to see if its listen address and port are in the WKA list. If it is then that server will start the cluster, otherwise it will wait for a WKA server to become available. To configure a cluster the same steps need to be followed for each server in the cluster:
  • Set an event server address in the config.xml file.
    • Add the following to the <cluster> element:
      <cluster>
          <server-name>server1</server-name>
          <server-host-name>oep1.oracle.com</server-host-name>
      </cluster>
    • The “server-name” is displayed in the visualizer and should be unique to the server.

    • The “server-host-name” is used by the visualizer to access remote servers.

    • The “server-host-name” must be an IP address or it must resolve to an IP address that is accessible from all other servers in the cluster.

    • The listening port is configured in the <netio> section of the config.xml.

    • The server-host-name/listening port combination should be unique to each server.

 
  • Set a common cluster multicast listen address shared by all servers in the config.xml file.
    • Add the following to the <cluster> element:
      <cluster>
          …
          <!—For us in Coherence multicast only! –>
          <multicast-address>239.255.200.200</multicast-address>
          <multicast-port>9200</multicast-port>
      </cluster>
    • The “multicast-address” must be able to be routed through any routers between servers in the cluster.

  • Optionally you can specify the bind address of the server, this allows you to control port usage and determine which network is used by Coherence

    • Create a “tangosol-coherence-override.xml” file in the ${DOMAIN}/{SERVERNAME}/config directory for each server in the cluster.
      <?xml version='1.0'?>
      <coherence>
          <cluster-config>
              <unicast-listener>
                  <!—This server Coherence address and port number –>
                  <address>192.168.56.91</address>
                  <port>9200</port>
              </unicast-listener>
          </cluster-config>
      </coherence>
  • Configure the Coherence WKA cluster discovery.

    • Create a “tangosol-coherence-override.xml” file in the ${DOMAIN}/{SERVERNAME}/config directory for each server in the cluster.
      <?xml version='1.0'?>
      <coherence>
          <cluster-config>
              <unicast-listener>
                  <!—WKA Configuration –>
                  <well-known-addresses>
                      <socket-address id="1">
                          <address>192.168.56.91</address>
                          <port>9200</port>
                      </socket-address>
                      <socket-address id="2">
                          <address>192.168.56.92</address>
                          <port>9200</port>
                      </socket-address>
                  </well-known-addresses>
                  <!—This server Coherence address and port number –>
                  <address>192.168.56.91</address>
                  <port>9200</port>
              </unicast-listener>
          </cluster-config>
      </coherence>

    • List at least two servers in the <socket-address> elements.

    • For each <socket-address> element there should be a server that has corresponding <address> and <port> elements directly under <well-known-addresses>.

    • One of the servers listed in the <well-known-addresses> element must be the first server started.

    • Not all servers need to be listed in <well-known-addresses>, but see previous point.

 
  • Enable clustering using a Coherence cluster.
    • Add the following to the <cluster> element in config.xml.
      <cluster>
          …
          <enabled>true</enabled>
      </cluster>
    • The “enabled” element tells OEP that it will be using Coherence to establish cluster membership, this can also be achieved by setting the value to be “coherence”.

 
  • The following shows the <cluster> config for another server in the cluster with differences highlighted:
    <cluster>
        <server-name>server2</server-name>
        <server-host-name>oep2.oracle.com</server-host-name>
        <!—For us in Coherence multicast only! –>
        <multicast-address>239.255.200.200</multicast-address>
        <multicast-port>9200</multicast-port>
        <enabled>true</enabled>
    </cluster>

  • The following shows the <cluster> config for another server in the cluster with differences highlighted:
    <cluster>
        <server-name>server2</server-name>
        <server-host-name>oep2.oracle.com</server-host-name>
        <enabled>true</enabled>
    </cluster>

 
  • The following shows the “tangosol-coherence-override.xml” file for another server in the cluster with differences highlighted:
    <?xml version='1.0'?>
    <coherence>
        <cluster-config>
            <unicast-listener>
                <!—WKA Configuration –>
                <well-known-addresses>
                    <socket-address id="1">
                        <address>192.168.56.91</address>
                        <port>9200</port>
                    </socket-address>
                    <socket-address id="2">
                        <address>192.168.56.92</address>
                        <port>9200</port>
                    </socket-address>
                    <!—This server Coherence address and port number –>
                    <address>192.168.56.92</address>
                    <port>9200</port>
                </well-known-addresses>
            </unicast-listener>
        </cluster-config>
    </coherence>

You should now have a working OEP cluster.  Check the cluster by starting all the servers.

Look for a message like the following on the first server to start to indicate that another server has joined the cluster:

<Coherence> <BEA-2049108> <The domain membership has changed to [server2, server1], the new domain primary is "server1">

Log on to the Event Processing Visualizer of one of the servers – http://<hostname>:<port>/wlevs.  Select the cluster name on the left and then select group “AllDomainMembers”.  You should see a list of all the running servers in the “Servers of Group – AllDomainMembers” section.

Sample Application

Now that we have a working OEP cluster let us look at a simple application that can be used as an example of how to cluster enable an application.  This application models service request tracking for hardware products.  The application we will use performs the following checks:

  1. If a new service request (identified by SRID) arrives (indicated by status=RAISE) then we expect some sort of follow up in the next 10 seconds (seconds because I want to test this quickly).  If no follow up is seen then an alert should be raised.
    • For example if I receive an event (SRID=1, status=RAISE) and after 10 seconds I have not received a follow up message (SRID=1, status<>RAISE) then I need to raise an alert.
  2. If a service request (identified by SRID) arrives and there has been another service request (identified by a different SRID) for the same physcial hardware (identified by TAG) then an alert should be raised.
    • For example if I receive an event (SRID=2, TAG=M1) and later I receive another event for the same hardware (SRID=3, TAG=M1) then an alert should be raised.

Note use case 1 is nicely time bounded – in this case the time window is 10 seconds.  Hence this is an ideal candidate to be implemented entirely in CQL.

Use case 2 has no time constraints, hence over time there could be a very large number of CQL queries running looking for a matching TAG but a different SRID.  In this case it is better to put the TAGs into a cache and search the cache for duplicate tags.  This reduces the amount of state information held in the OEP engine.

The sample application to implement this is shown below:

Messages are received from a JMS Topic (InboundTopicAdapter).  Test messages can be injected via a CSV adapter (RequestEventCSVAdapter).  Alerts are sent to a JMS Queue (OutboundQueueAdapter), and also printed to the server standard output (PrintBean).  Use case 1 is implemented by the MissingEventProcessor.  Use case 2 is implemented by inserting the TAG into a cache (InsertServiceTagCacheBean) using a Coherence event processor and then querying the cache for each new service request (DuplicateTagProcessor), if the same tag is already associated with an SR in the cache then an alert is raised.  The RaiseEventFilter is used to filter out existing service requests from the use case 2 stream.

The non-HA version of the application is available to download here.

We will use this application to demonstrate how to HA enable an application for deployment on our cluster.

A CSV file (TestData.csv) and Load generator properties file (HADemoTest.prop) is provided to test the application by injecting events using the CSV Adapter.

Note that the application reads a configuration file (System.properties) which should be placed in the domain directory of each event server.

Deploying an Application

Before deploying an application to a cluster it is a good idea to create a group in the cluster.  Multiple servers can be members of this group.  To add a group to an event server just add an entry to the <cluster> element in config.xml as shown below:

<cluster>
      …
      <groups>HAGroup</groups>
   </cluster>

Multiple servers can be members of a group and a server can be a member of multiple groups.  This allows you to have different levels of high availability in the same event processing cluster.

Deploy the application using the Visualizer.  Target the application at the group you created, or the AllDomainMembers group.

Test the application, typically using a CSV Adapter.  Note that using a CSV adapter sends all the events to a single event server.  To fix this we need to add a JMS output adapter (OutboundTopicAdapter) to our application and then send events from the CSV adapter to the outbound JMS adapter as shown below:

So now we are able to send events via CSV to an event processor that in turn sends the events to a JMS topic.  But we still have a few challenges.

Managing Input

First challenge is managing input.  Because OEP relies on the same event stream being processed by multiple servers we need to make sure that all our servers get the same message from the JMS Topic.  To do this we configure the JMS connection factory to have an Unrestricted Client ID.  This allows multiple clients (OEP servers in our case) to use the same connection factory.  Client IDs are mandatory when using durable topic subscriptions.  We also need each event server to have its own subscriber ID for the JMS Topic, this ensures that each server will get a copy of all the messages posted to the topic.  If we use the same subscriber ID for all the servers then the messages will be distributed across the servers, with each server seeing a completely disjoint set of messages to the other servers in the cluster.  This is not what we want because each server should see the same event stream.  We can use the server name as the subscriber ID as shown in the below excerpt from our application:

<wlevs:adapter id="InboundTopicAdapter" provider="jms-inbound">
    …
    <wlevs:instance-property name="durableSubscriptionName"
            value="${com_bea_wlevs_configuration_server_ClusterType.serverName}" />
</wlevs:adapter>

This works because I have placed a ConfigurationPropertyPlaceholderConfigurer bean in my application as shown below, this same bean is also used to access properties from a configuration file:

<bean id="ConfigBean"
        class="com.bea.wlevs.spring.support.ConfigurationPropertyPlaceholderConfigurer">
        <property name="location" value="file:../Server.properties"/>
    </bean>

With this configuration each server will now get a copy of all the events.

As our application relies on elapsed time we should make sure that the timestamps of the received messages are the same on all servers.  We do this by adding an HA Input adapter to our application.

<wlevs:adapter id="HAInputAdapter" provider="ha-inbound">
    <wlevs:listener ref="RequestChannel" />
    <wlevs:instance-property name="keyProperties"
            value="EVID" />
    <wlevs:instance-property name="timeProperty" value="arrivalTime"/>
</wlevs:adapter>

The HA Adapter sets the given “timeProperty” in the input message to be the current system time.  This time is then communicated to other HAInputAdapters deployed to the same group.  This allows all servers in the group to have the same timestamp in their event.  The event is identified by the “keyProperties” key field.

To allow the downstream processing to treat the timestamp as an arrival time then the downstream channel is configured with an “application-timestamped” element to set the arrival time of the event.  This is shown below:

<wlevs:channel id="RequestChannel" event-type="ServiceRequestEvent">
    <wlevs:listener ref="MissingEventProcessor" />
    <wlevs:listener ref="RaiseEventFilterProcessor" />
    <wlevs:application-timestamped>
        <wlevs:expression>arrivalTime</wlevs:expression>
    </wlevs:application-timestamped>
</wlevs:channel>

Note the property set in the HAInputAdapter is used to set the arrival time of the event.

So now all servers in our cluster have the same events arriving from a topic, and each event arrival time is synchronized across the servers in the cluster.

Managing Output

Note that an OEP cluster has multiple servers processing the same input stream.  Obviously if we have the same inputs, synchronized to appear to arrive at the same time then we will get the same outputs, which is central to OEPs promise of high availability.  So when an alert is raised by our application it will be raised by every server in the cluster.  If we have 3 servers in the cluster then we will get 3 copies of the same alert appearing on our alert queue.  This is probably not what we want.  To fix this we take advantage of an HA Output Adapter.  unlike input where there is a single HA Input Adapter there are multiple HA Output Adapters, each with distinct performance and behavioral characteristics.  The table below is taken from the Oracle® Fusion Middleware Developer's Guide for Oracle Event Processing and shows the different levels of service and performance impact:

Table 24-1 Oracle Event Processing High Availability Quality of ServiceHigh Availability OptionMissed Events?Duplicate Events?Performance OverheadSection 24.1.2.1, "Simple Failover"Yes (many)Yes (few)NegligibleSection 24.1.2.2, "Simple Failover with Buffering"Yes (few)Foot 1Yes (many)LowSection 24.1.2.3, "Light-Weight Queue Trimming"NoYes (few)Low-MediumFoot 2 Section 24.1.2.4, "Precise Recovery with JMS"NoNoHigh

I decided to go for the lightweight queue trimming option.  This means I won’t lose any events, but I may emit a few duplicate events in the event of primary failure.  This setting causes all output events to be buffered by secondary's until they are told by the primary that a particular event has been emitted.  To configure this option I add the following adapter to my EPN:

    <wlevs:adapter id="HAOutputAdapter" provider="ha-broadcast">
        <wlevs:listener ref="OutboundQueueAdapter" />
        <wlevs:listener ref="PrintBean" />
        <wlevs:instance-property name="keyProperties" value="timestamp"/>
        <wlevs:instance-property name="monotonic" value="true"/>
        <wlevs:instance-property name="totalOrder" value="false"/>
    </wlevs:adapter>

This uses the time of the alert (timestamp property) as the key to be used to identify events which have been trimmed.  This works in this application because the alert time is the time of the source event, and the time of the source events are synchronized using the HA Input Adapter.  Because this is a time value then it will increase, and so I set monotonic=”true”.  However I may get two alerts raised at the same timestamp and in that case I set totalOrder=”false”.

I also added the additional configuration to config.xml for the application:

<ha:ha-broadcast-adapter>
    <name>HAOutputAdapter</name>
    <warm-up-window-length units="seconds">15</warm-up-window-length>
    <trimming-interval units="millis">1000</trimming-interval>
</ha:ha-broadcast-adapter>

This causes the primary to tell the secondary's which is its latest emitted alert every 1 second.  This will cause the secondary's to trim from their buffers all alerts prior to and including the latest emitted alerts.  So in the worst case I will get one second of duplicated alerts.  It is also possible to set a number of events rather than a time period.  The trade off here is that I can reduce synchronization overhead by having longer time intervals or more events, causing more memory to be used by the secondary's or I can cause more frequent synchronization, using less memory in the secondary's and generating fewer duplicate alerts but there will be more communication between the primary and the secondary's to trim the buffer.

The warm-up window is used to stop a secondary joining the cluster before it has been running for that time period.  The window is based on the time that the EPN needs to be running to be have the same state as the other servers.  In our example application we have a CQL that runs for a period of 10 seconds, so I set the warm up window to be 15 seconds to ensure that a newly started server had the same state as all the other servers in the cluster.  The warm up window should be greater than the longest query window.

Adding an External Coherence Cluster

When we are running OEP as a cluster then we have additional overhead in the servers.  The HA Input Adapter is synchronizing event time across the servers, the HA Output adapter is synchronizing output events across the servers.  The HA Output adapter is also buffering output events in the secondary’s.  We can’t do anything about this but we can move the Coherence Cache we are using outside of the OEP servers, reducing the memory pressure on those servers and also moving some of the processing outside of the server.  Making our Coherence caches external to our OEP cluster is a good idea for the following reasons:

  • Allows moving storage of cache entries outside of the OEP server JVMs hence freeing more memory for storing CQL state.
  • Allows storage of more entries in the cache by scaling cache independently of the OEP cluster.
  • Moves cache processing outside OEP servers.

To create the external Coherence cache do the following:

  • Create a new directory for our standalone Coherence servers, perhaps at the same level as the OEP domain directory.
  • Copy the tangosol-coherence-override.xml file previously created for the OEP cluster into a config directory under the Coherence directory created in the previous step.
  • Copy the coherence-cache-config.xml file from the application into a config directory under the Coherence directory created in the previous step.
  • Add the following to the tangosol-coherence-override.xml file in the Coherence config directory:
    • <coherence>
          <cluster-config>
              <member-identity>
                  <cluster-name>oep_cluster</cluster-name>
                  <member-name>Grid1</member-name>
              </member-identity>
              …
          </cluster-config>
      </coherence>
    • Important Note: The <cluster-name> must match the name of the OEP cluster as defined in the <domain><name> element in the event servers config.xml.
    • The member name is used to help identify the server.
  • Disable storage for our caches in the event servers by editing the coherence-cache-config.xml file in the application and adding the following element to the caches:
    • <distributed-scheme>
          <scheme-name>DistributedCacheType</scheme-name>
          <service-name>DistributedCache</service-name>
          <backing-map-scheme>
              <local-scheme/>
          </backing-map-scheme>
          <local-storage>false</local-storage>
      </distributed-scheme>
    • The local-storage flag stops the OEP server from storing entries for caches using this cache schema.
    • Do not disable storage at the global level (-Dtangosol.coherence.distributed.localstorage=false) because this will disable storage on some OEP specific cache schemes as well as our application cache.  We don’t want to put those schemes into our cache servers because they are used by OEP to maintain cluster integrity and have only one entry per application per server, so are very small.  If we put those into our Coherence Cache servers we would have to add OEP specific libraries to our cache servers and enable them in our coherence-cache-config.xml, all of which is too much trouble for little or no benefit.
  • If using Unicast Discovery (this section is not required if using Multicast) then we want to make the Coherence Grid be the Well Known Address servers because we want to disable storage of entries on our OEP servers, and Coherence nodes with storage disabled cannot initialize a cluster.  To enable the Coherence servers to be primaries in the Coherence grid do the following:
    • Change the unicast-listener addresses in the Coherence servers tangosol-coherence-override.xml file to be suitable values for the machine they are running on – typically change the listen address.
    • Modify the WKA addresses in the OEP servers and the Coherence servers tangosol-coherence-override.xml file to match at least two of the Coherence servers listen addresses.
    • The following table shows how this might be configured for 2 OEP servers and 2 Cache servers
      OEP Server 1OEP Server 2Cache Server 1Cache Server 2

      <?xml version='1.0'?>
      <coherence>
        <cluster-config>








          <unicast-listener>
            <well-known-addresses>
              <socket-address id="1">
                <address>
                  192.168.56.91
               
      </address>
                <port>9300</port>
              </socket-address>
              <socket-address id="2">
                <address>
                  192.168.56.92
               
      </address>
                <port>9300</port>
              </socket-address>
            </well-known-addresses>
            <address>
              192.168.56.91
           
      </address>
            <port>9200</port>
          </unicast-listener>
        </cluster-config>
      </coherence>

      <?xml version='1.0'?>
      <coherence>
        <cluster-config>








          <unicast-listener>
            <well-known-addresses>
              <socket-address id="1">
                <address>
                  192.168.56.91
               
      </address>
                <port>9300</port>
              </socket-address>
              <socket-address id="2">
                <address>
                  192.168.56.92
               
      </address>
                <port>9300</port>
              </socket-address>
            </well-known-addresses>
            <address>
              192.168.56.92
           
      </address>
            <port>9200</port>
          </unicast-listener>
        </cluster-config>
      </coherence>

      <?xml version='1.0'?>
      <coherence>
        <cluster-config>
          <member-identity>
            <cluster-name>
              oep_cluster
            </cluster-name>
            <member-name>
              Grid1
            </member-name>
          </member-identity>
          <unicast-listener>
            <well-known-addresses>
              <socket-address id="1">
                <address>
                  192.168.56.91
               
      </address>
                <port>9300</port>
              </socket-address>
              <socket-address id="2">
                <address>
                  192.168.56.92
               
      </address>
                <port>9300</port>
              </socket-address>
            </well-known-addresses>
            <address>
              192.168.56.91
           
      </address>
            <port>9300</port>
          </unicast-listener>
        </cluster-config>
      </coherence>

      <?xml version='1.0'?>
      <coherence>
        <cluster-config>
          <member-identity>
            <cluster-name>
              oep_cluster
            </cluster-name>
            <member-name>
              Grid2
            </member-name>
          </member-identity>
          <unicast-listener>
            <well-known-addresses>
              <socket-address id="1">
                <address>
                  192.168.56.91
               
      </address>
                <port>9300</port>
              </socket-address>
              <socket-address id="2">
                <address>
                  192.168.56.92
               
      </address>
                <port>9300</port>
              </socket-address>
            </well-known-addresses>
            <address>
              192.168.56.92
           
      </address>
            <port>9300</port>
          </unicast-listener>
        </cluster-config>
      </coherence>

    • Note that the OEP servers do not listen on the WKA addresses, using different port numbers even though they run on the same servers as the cache servers.
    • Also not that the Coherence servers are the ones that listen on the WKA addresses.
  • Now that the configuration is complete we can create a start script for the Coherence grid servers as follows:
    • #!/bin/sh
      MW_HOME=/home/oracle/fmw
      OEP_HOME=${MW_HOME}/ocep_11.1
      JAVA_HOME=${MW_HOME}/jrockit_160_33
      CACHE_SERVER_HOME=${MW_HOME}/user_projects/domains/oep_coherence
      CACHE_SERVER_CLASSPATH=${CACHE_SERVER_HOME}/HADemoCoherence.jar:${CACHE_SERVER_HOME}/config
      COHERENCE_JAR=${OEP_HOME}/modules/com.tangosol.coherence_3.7.1.6.jar
      JAVAEXEC=$JAVA_HOME/bin/java
      # specify the JVM heap size
      MEMORY=512m
      if [[ $1 == '-jmx' ]]; then
          JMXPROPERTIES="-Dcom.sun.management.jmxremote -Dtangosol.coherence.management=all -Dtangosol.coherence.management.remote=true"
          shift
      fi
      JAVA_OPTS="-Xms$MEMORY -Xmx$MEMORY $JMXPROPERTIES"
      $JAVAEXEC -server -showversion $JAVA_OPTS -cp "${CACHE_SERVER_CLASSPATH}:${COHERENCE_JAR}" com.tangosol.net.DefaultCacheServer $1
    • Note that I put the tangosol-coherence-override and the coherence-cache-config.xml files in a config directory and added that directory to my path (CACHE_SERVER_CLASSPATH=${CACHE_SERVER_HOME}/HADemoCoherence.jar:${CACHE_SERVER_HOME}/config) so that Coherence would find the override file.
    • Because my application uses in-cache processing (entry processors) I had to add a jar file containing the required classes for the entry processor to the classpath (CACHE_SERVER_CLASSPATH=${CACHE_SERVER_HOME}/HADemoCoherence.jar:${CACHE_SERVER_HOME}/config).
    • The classpath references the Coherence Jar shipped with OEP to avoid versoin mismatches (COHERENCE_JAR=${OEP_HOME}/modules/com.tangosol.coherence_3.7.1.6.jar).
    • This script is based on the standard cache-server.sh script that ships with standalone Coherence.
    • The –jmx flag can be passed to the script to enable Coherence JMX management beans.

We have now configured Coherence to use an external data grid for its application caches.  When starting we should always start at least one of the grid servers before starting the OEP servers.  This will allow the OEP server to find the grid.  If we do start things in the wrong order then the OEP servers will block waiting for a storage enabled node to start (one of the WKA servers if using Unicast).

Summary

We have now created an OEP cluster that makes use of an external Coherence grid for application caches.  The application has been modified to ensure that the timestamps of arriving events are synchronized and the output events are only output by one of the servers in the cluster.  In event of failure we may get some duplicate events with our configuration (there are configurations that avoid duplicate events) but we will not lose any events.  The final version of the application with full HA capability is shown below:

Files

The following files are available for download:

  • Oracle Event Processing
    • Includes Coherence
  • None-HA version of application
    • Includes test file TestData.csv and Load Test property file HADemoTest.prop
    • Includes Server.properties.Antony file to customize to point to your WLS installation
  • HA version of application
    • Includes test file TestData.csv and Load Test property file HADemoTest.prop
    • Includes Server.properties.Antony file to customize to point to your WLS installation
  • OEP Cluster Files
    • Includes config.xml
    • Includes tangosol-coherence-override.xml
    • Includes Server.properties that will need customizing for your WLS environment
  • Coherence Cluster Files
    • Includes tangosol-coherence-override.xml and coherence-cache-configuration.xml
    • includes cache-server.sh start script
    • Includes HADemoCoherence.jar with required classes for entry processor
References

The following references may be helpful:

Clustering Events

Antony Reynolds - Wed, 2014-02-26 13:25
Setting up an Oracle Event Processing Cluster

Recently I was working with Oracle Event Processing (OEP) and needed to set it up as part  of a high availability cluster.  OEP uses Coherence for quorum membership in an OEP cluster.  Because the solution used caching it was also necessary to include access to external Coherence nodes.  Input messages need to be duplicated across multiple OEP streams and so a JMS Topic adapter needed to be configured.  Finally only one copy of each output event was desired, requiring the use of an HA adapter.  In this blog post I will go through the steps required to implement a true HA OEP cluster.

OEP High Availability Review

The diagram below shows a very simple non-HA OEP configuration:

Events are received from a source (JMS in this blog).  The events are processed by an event processing network which makes use of a cache (Coherence in this blog).  Finally any output events are emitted.  The output events could go to any destination but in this blog we will emit them to a JMS queue.

OEP provides high availability by having multiple event processing instances processing the same event stream in an OEP cluster.  One instance acts as the primary and the other instances act as secondary processors.  Usually only the primary will output events as shown in the diagram below (top stream is the primary):

The actual event processing is the same as in the previous non-HA example.  What is different is how input and output events are handled.  Because we want to minimize or avoid duplicate events we have added an HA output adapter to the event processing network.  This adapter acts as a filter, so that only the primary stream will emit events to out queue.  If the processing of events within the network depends on how the time at which events are received then it is necessary to synchronize the event arrival time across the cluster by using an HA input adapter to synchronize the arrival timestamps of events across the cluster.

OEP Cluster Creation

Lets begin by setting up the base OEP cluster.  To do this we create new OEP configurations on each machine in the cluster.  The steps are outlined below.  Note that the same steps are performed on each machine for each server which will run on that machine:

  • Run ${MW_HOME}/ocep_11.1/common/bin/config.sh.
    • MW_HOME is the installation directory, note that multiple Fusion Middleware products may be installed in this directory.
  • When prompted “Create a new OEP domain”.
  • Provide administrator credentials.
    • Make sure you provide the same credentials on all machines in the cluster.
  • Specify a  “Server name” and “Server listen port”.
    • Each OEP server must have a unique name.
    • Different servers can share the same “Server listen port” unless they are running on the same host.
  • Provide keystore credentials.
    • Make sure you provide the same credentials on all machines in the cluster.
  • Configure any required JDBC data source.
  • Provide the “Domain Name” and “Domain location”.
    • All servers must have the same “Domain name”.
    • The “Domain location” may be different on each server, but I would keep it the same to simplify administration.
    • Multiple servers on the same machine can share the “Domain location” because their configuration will be placed in the directory corresponding to their server name.
  • Create domain!
Configuring an OEP Cluster

Now that we have created our servers we need to configure them so that they can find each other.  OEP uses Oracle Coherence to determine cluster membership.  Coherence clusters can use either multicast or unicast to discover already running members of a cluster.  Multicast has the advantage that it is easy to set up and scales better (see http://www.ateam-oracle.com/using-wka-in-large-coherence-clusters-disabling-multicast/) but has a number of challenges, including failure to propagate by default through routers and accidently joining the wrong cluster because someone else chose the same multicast settings.  We will show how to use both unicast and multicast to discover the cluster. 

Multicast Discovery Unicast Discovery Coherence multicast uses a class D multicast address that is shared by all servers in the cluster.  On startup a Coherence node broadcasts a message to the multicast address looking for an existing cluster.  If no-one responds then the node will start the cluster. Coherence unicast uses Well Known Addresses (WKAs). Each server in the cluster needs a dedicated listen address/port combination. A subset of these addresses are configured as WKAs and shared between all members of the cluster. As long as at least one of the WKAs is up and running then servers can join the cluster. If a server does not find any cluster members then it checks to see if its listen address and port are in the WKA list. If it is then that server will start the cluster, otherwise it will wait for a WKA server to become available.   To configure a cluster the same steps need to be followed for each server in the cluster:
  • Set an event server address in the config.xml file.
    • Add the following to the <cluster> element:
      <cluster>
          <server-name>server1</server-name>
          <server-host-name>oep1.oracle.com</server-host-name>
      </cluster>
    • The “server-name” is displayed in the visualizer and should be unique to the server.

    • The “server-host-name” is used by the visualizer to access remote servers.

    • The “server-host-name” must be an IP address or it must resolve to an IP address that is accessible from all other servers in the cluster.

    • The listening port is configured in the <netio> section of the config.xml.

    • The server-host-name/listening port combination should be unique to each server.

 
  • Set a common cluster multicast listen address shared by all servers in the config.xml file.
    • Add the following to the <cluster> element:
      <cluster>
          …
          <!—For us in Coherence multicast only! –>
          <multicast-address>239.255.200.200</multicast-address>
          <multicast-port>9200</multicast-port>
      </cluster>
    • The “multicast-address” must be able to be routed through any routers between servers in the cluster.

  • Optionally you can specify the bind address of the server, this allows you to control port usage and determine which network is used by Coherence

    • Create a “tangosol-coherence-override.xml” file in the ${DOMAIN}/{SERVERNAME}/config directory for each server in the cluster.
      <?xml version='1.0'?>
      <coherence>
          <cluster-config>
              <unicast-listener>
                  <!—This server Coherence address and port number –>
                  <address>192.168.56.91</address>
                  <port>9200</port>
              </unicast-listener>
          </cluster-config>
      </coherence>
  • Configure the Coherence WKA cluster discovery.

    • Create a “tangosol-coherence-override.xml” file in the ${DOMAIN}/{SERVERNAME}/config directory for each server in the cluster.
      <?xml version='1.0'?>
      <coherence>
          <cluster-config>
              <unicast-listener>
                  <!—WKA Configuration –>
                  <well-known-addresses>
                      <socket-address id="1">
                          <address>192.168.56.91</address>
                          <port>9200</port>
                      </socket-address>
                      <socket-address id="2">
                          <address>192.168.56.92</address>
                          <port>9200</port>
                      </socket-address>
                  </well-known-addresses>
                  <!—This server Coherence address and port number –>
                  <address>192.168.56.91</address>
                  <port>9200</port>
              </unicast-listener>
          </cluster-config>
      </coherence>

    • List at least two servers in the <socket-address> elements.

    • For each <socket-address> element there should be a server that has corresponding <address> and <port> elements directly under <well-known-addresses>.

    • One of the servers listed in the <well-known-addresses> element must be the first server started.

    • Not all servers need to be listed in <well-known-addresses>, but see previous point.

 
  • Enable clustering using a Coherence cluster.
    • Add the following to the <cluster> element in config.xml.
      <cluster>
          …
          <enabled>true</enabled>
      </cluster>
    • The “enabled” element tells OEP that it will be using Coherence to establish cluster membership, this can also be achieved by setting the value to be “coherence”.

 
  • The following shows the <cluster> config for another server in the cluster with differences highlighted:
    <cluster>
        <server-name>server2</server-name>
        <server-host-name>oep2.oracle.com</server-host-name>
        <!—For us in Coherence multicast only! –>
        <multicast-address>239.255.200.200</multicast-address>
        <multicast-port>9200</multicast-port>
        <enabled>true</enabled>
    </cluster>

  • The following shows the <cluster> config for another server in the cluster with differences highlighted:
    <cluster>
        <server-name>server2</server-name>
        <server-host-name>oep2.oracle.com</server-host-name>
        <enabled>true</enabled>
    </cluster>

 
  • The following shows the “tangosol-coherence-override.xml” file for another server in the cluster with differences highlighted:
    <?xml version='1.0'?>
    <coherence>
        <cluster-config>
            <unicast-listener>
                <!—WKA Configuration –>
                <well-known-addresses>
                    <socket-address id="1">
                        <address>192.168.56.91</address>
                        <port>9200</port>
                    </socket-address>
                    <socket-address id="2">
                        <address>192.168.56.92</address>
                        <port>9200</port>
                    </socket-address>
                    <!—This server Coherence address and port number –>
                    <address>192.168.56.92</address>
                    <port>9200</port>
                </well-known-addresses>
            </unicast-listener>
        </cluster-config>
    </coherence>

You should now have a working OEP cluster.  Check the cluster by starting all the servers.

Look for a message like the following on the first server to start to indicate that another server has joined the cluster:

<Coherence> <BEA-2049108> <The domain membership has changed to [server2, server1], the new domain primary is "server1">

Log on to the Event Processing Visualizer of one of the servers – http://<hostname>:<port>/wlevs.  Select the cluster name on the left and then select group “AllDomainMembers”.  You should see a list of all the running servers in the “Servers of Group – AllDomainMembers” section.

Sample Application

Now that we have a working OEP cluster let us look at a simple application that can be used as an example of how to cluster enable an application.  This application models service request tracking for hardware products.  The application we will use performs the following checks:

  1. If a new service request (identified by SRID) arrives (indicated by status=RAISE) then we expect some sort of follow up in the next 10 seconds (seconds because I want to test this quickly).  If no follow up is seen then an alert should be raised.
    • For example if I receive an event (SRID=1, status=RAISE) and after 10 seconds I have not received a follow up message (SRID=1, status<>RAISE) then I need to raise an alert.
  2. If a service request (identified by SRID) arrives and there has been another service request (identified by a different SRID) for the same physcial hardware (identified by TAG) then an alert should be raised.
    • For example if I receive an event (SRID=2, TAG=M1) and later I receive another event for the same hardware (SRID=3, TAG=M1) then an alert should be raised.

Note use case 1 is nicely time bounded – in this case the time window is 10 seconds.  Hence this is an ideal candidate to be implemented entirely in CQL.

Use case 2 has no time constraints, hence over time there could be a very large number of CQL queries running looking for a matching TAG but a different SRID.  In this case it is better to put the TAGs into a cache and search the cache for duplicate tags.  This reduces the amount of state information held in the OEP engine.

The sample application to implement this is shown below:

Messages are received from a JMS Topic (InboundTopicAdapter).  Test messages can be injected via a CSV adapter (RequestEventCSVAdapter).  Alerts are sent to a JMS Queue (OutboundQueueAdapter), and also printed to the server standard output (PrintBean).  Use case 1 is implemented by the MissingEventProcessor.  Use case 2 is implemented by inserting the TAG into a cache (InsertServiceTagCacheBean) using a Coherence event processor and then querying the cache for each new service request (DuplicateTagProcessor), if the same tag is already associated with an SR in the cache then an alert is raised.  The RaiseEventFilter is used to filter out existing service requests from the use case 2 stream.

The non-HA version of the application is available to download here.

We will use this application to demonstrate how to HA enable an application for deployment on our cluster.

A CSV file (TestData.csv) and Load generator properties file (HADemoTest.prop) is provided to test the application by injecting events using the CSV Adapter.

Note that the application reads a configuration file (System.properties) which should be placed in the domain directory of each event server.

Deploying an Application

Before deploying an application to a cluster it is a good idea to create a group in the cluster.  Multiple servers can be members of this group.  To add a group to an event server just add an entry to the <cluster> element in config.xml as shown below:

<cluster>
      …
      <groups>HAGroup</groups>
   </cluster>

Multiple servers can be members of a group and a server can be a member of multiple groups.  This allows you to have different levels of high availability in the same event processing cluster.

Deploy the application using the Visualizer.  Target the application at the group you created, or the AllDomainMembers group.

Test the application, typically using a CSV Adapter.  Note that using a CSV adapter sends all the events to a single event server.  To fix this we need to add a JMS output adapter (OutboundTopicAdapter) to our application and then send events from the CSV adapter to the outbound JMS adapter as shown below:

So now we are able to send events via CSV to an event processor that in turn sends the events to a JMS topic.  But we still have a few challenges.

Managing Input

First challenge is managing input.  Because OEP relies on the same event stream being processed by multiple servers we need to make sure that all our servers get the same message from the JMS Topic.  To do this we configure the JMS connection factory to have an Unrestricted Client ID.  This allows multiple clients (OEP servers in our case) to use the same connection factory.  Client IDs are mandatory when using durable topic subscriptions.  We also need each event server to have its own subscriber ID for the JMS Topic, this ensures that each server will get a copy of all the messages posted to the topic.  If we use the same subscriber ID for all the servers then the messages will be distributed across the servers, with each server seeing a completely disjoint set of messages to the other servers in the cluster.  This is not what we want because each server should see the same event stream.  We can use the server name as the subscriber ID as shown in the below excerpt from our application:

<wlevs:adapter id="InboundTopicAdapter" provider="jms-inbound">
    …
    <wlevs:instance-property name="durableSubscriptionName"
            value="${com_bea_wlevs_configuration_server_ClusterType.serverName}" />
</wlevs:adapter>

This works because I have placed a ConfigurationPropertyPlaceholderConfigurer bean in my application as shown below, this same bean is also used to access properties from a configuration file:

<bean id="ConfigBean"
        class="com.bea.wlevs.spring.support.ConfigurationPropertyPlaceholderConfigurer">
        <property name="location" value="file:../Server.properties"/>
    </bean>

With this configuration each server will now get a copy of all the events.

As our application relies on elapsed time we should make sure that the timestamps of the received messages are the same on all servers.  We do this by adding an HA Input adapter to our application.

<wlevs:adapter id="HAInputAdapter" provider="ha-inbound">
    <wlevs:listener ref="RequestChannel" />
    <wlevs:instance-property name="keyProperties"
            value="EVID" />
    <wlevs:instance-property name="timeProperty" value="arrivalTime"/>
</wlevs:adapter>

The HA Adapter sets the given “timeProperty” in the input message to be the current system time.  This time is then communicated to other HAInputAdapters deployed to the same group.  This allows all servers in the group to have the same timestamp in their event.  The event is identified by the “keyProperties” key field.

To allow the downstream processing to treat the timestamp as an arrival time then the downstream channel is configured with an “application-timestamped” element to set the arrival time of the event.  This is shown below:

<wlevs:channel id="RequestChannel" event-type="ServiceRequestEvent">
    <wlevs:listener ref="MissingEventProcessor" />
    <wlevs:listener ref="RaiseEventFilterProcessor" />
    <wlevs:application-timestamped>
        <wlevs:expression>arrivalTime</wlevs:expression>
    </wlevs:application-timestamped>
</wlevs:channel>

Note the property set in the HAInputAdapter is used to set the arrival time of the event.

So now all servers in our cluster have the same events arriving from a topic, and each event arrival time is synchronized across the servers in the cluster.

Managing Output

Note that an OEP cluster has multiple servers processing the same input stream.  Obviously if we have the same inputs, synchronized to appear to arrive at the same time then we will get the same outputs, which is central to OEPs promise of high availability.  So when an alert is raised by our application it will be raised by every server in the cluster.  If we have 3 servers in the cluster then we will get 3 copies of the same alert appearing on our alert queue.  This is probably not what we want.  To fix this we take advantage of an HA Output Adapter.  unlike input where there is a single HA Input Adapter there are multiple HA Output Adapters, each with distinct performance and behavioral characteristics.  The table below is taken from the Oracle® Fusion Middleware Developer's Guide for Oracle Event Processing and shows the different levels of service and performance impact:

Table 24-1 Oracle Event Processing High Availability Quality of Service High Availability Option Missed Events? Duplicate Events? Performance Overhead Section 24.1.2.1, "Simple Failover" Yes (many) Yes (few) Negligible Section 24.1.2.2, "Simple Failover with Buffering" Yes (few)Foot 1 Yes (many) Low Section 24.1.2.3, "Light-Weight Queue Trimming" No Yes (few) Low-MediumFoot 2 Section 24.1.2.4, "Precise Recovery with JMS" No No High

I decided to go for the lightweight queue trimming option.  This means I won’t lose any events, but I may emit a few duplicate events in the event of primary failure.  This setting causes all output events to be buffered by secondary's until they are told by the primary that a particular event has been emitted.  To configure this option I add the following adapter to my EPN:

    <wlevs:adapter id="HAOutputAdapter" provider="ha-broadcast">
        <wlevs:listener ref="OutboundQueueAdapter" />
        <wlevs:listener ref="PrintBean" />
        <wlevs:instance-property name="keyProperties" value="timestamp"/>
        <wlevs:instance-property name="monotonic" value="true"/>
        <wlevs:instance-property name="totalOrder" value="false"/>
    </wlevs:adapter>

This uses the time of the alert (timestamp property) as the key to be used to identify events which have been trimmed.  This works in this application because the alert time is the time of the source event, and the time of the source events are synchronized using the HA Input Adapter.  Because this is a time value then it will increase, and so I set monotonic=”true”.  However I may get two alerts raised at the same timestamp and in that case I set totalOrder=”false”.

I also added the additional configuration to config.xml for the application:

<ha:ha-broadcast-adapter>
    <name>HAOutputAdapter</name>
    <warm-up-window-length units="seconds">15</warm-up-window-length>
    <trimming-interval units="millis">1000</trimming-interval>
</ha:ha-broadcast-adapter>

This causes the primary to tell the secondary's which is its latest emitted alert every 1 second.  This will cause the secondary's to trim from their buffers all alerts prior to and including the latest emitted alerts.  So in the worst case I will get one second of duplicated alerts.  It is also possible to set a number of events rather than a time period.  The trade off here is that I can reduce synchronization overhead by having longer time intervals or more events, causing more memory to be used by the secondary's or I can cause more frequent synchronization, using less memory in the secondary's and generating fewer duplicate alerts but there will be more communication between the primary and the secondary's to trim the buffer.

The warm-up window is used to stop a secondary joining the cluster before it has been running for that time period.  The window is based on the time that the EPN needs to be running to be have the same state as the other servers.  In our example application we have a CQL that runs for a period of 10 seconds, so I set the warm up window to be 15 seconds to ensure that a newly started server had the same state as all the other servers in the cluster.  The warm up window should be greater than the longest query window.

Adding an External Coherence Cluster

When we are running OEP as a cluster then we have additional overhead in the servers.  The HA Input Adapter is synchronizing event time across the servers, the HA Output adapter is synchronizing output events across the servers.  The HA Output adapter is also buffering output events in the secondary’s.  We can’t do anything about this but we can move the Coherence Cache we are using outside of the OEP servers, reducing the memory pressure on those servers and also moving some of the processing outside of the server.  Making our Coherence caches external to our OEP cluster is a good idea for the following reasons:

  • Allows moving storage of cache entries outside of the OEP server JVMs hence freeing more memory for storing CQL state.
  • Allows storage of more entries in the cache by scaling cache independently of the OEP cluster.
  • Moves cache processing outside OEP servers.

To create the external Coherence cache do the following:

  • Create a new directory for our standalone Coherence servers, perhaps at the same level as the OEP domain directory.
  • Copy the tangosol-coherence-override.xml file previously created for the OEP cluster into a config directory under the Coherence directory created in the previous step.
  • Copy the coherence-cache-config.xml file from the application into a config directory under the Coherence directory created in the previous step.
  • Add the following to the tangosol-coherence-override.xml file in the Coherence config directory:
    • <coherence>
          <cluster-config>
              <member-identity>
                  <cluster-name>oep_cluster</cluster-name>
                  <member-name>Grid1</member-name>
              </member-identity>
              …
          </cluster-config>
      </coherence>
    • Important Note: The <cluster-name> must match the name of the OEP cluster as defined in the <domain><name> element in the event servers config.xml.
    • The member name is used to help identify the server.
  • Disable storage for our caches in the event servers by editing the coherence-cache-config.xml file in the application and adding the following element to the caches:
    • <distributed-scheme>
          <scheme-name>DistributedCacheType</scheme-name>
          <service-name>DistributedCache</service-name>
          <backing-map-scheme>
              <local-scheme/>
          </backing-map-scheme>
          <local-storage>false</local-storage>
      </distributed-scheme>
    • The local-storage flag stops the OEP server from storing entries for caches using this cache schema.
    • Do not disable storage at the global level (-Dtangosol.coherence.distributed.localstorage=false) because this will disable storage on some OEP specific cache schemes as well as our application cache.  We don’t want to put those schemes into our cache servers because they are used by OEP to maintain cluster integrity and have only one entry per application per server, so are very small.  If we put those into our Coherence Cache servers we would have to add OEP specific libraries to our cache servers and enable them in our coherence-cache-config.xml, all of which is too much trouble for little or no benefit.
  • If using Unicast Discovery (this section is not required if using Multicast) then we want to make the Coherence Grid be the Well Known Address servers because we want to disable storage of entries on our OEP servers, and Coherence nodes with storage disabled cannot initialize a cluster.  To enable the Coherence servers to be primaries in the Coherence grid do the following:
    • Change the unicast-listener addresses in the Coherence servers tangosol-coherence-override.xml file to be suitable values for the machine they are running on – typically change the listen address.
    • Modify the WKA addresses in the OEP servers and the Coherence servers tangosol-coherence-override.xml file to match at least two of the Coherence servers listen addresses.
    • The following table shows how this might be configured for 2 OEP servers and 2 Cache servers
      OEP Server 1 OEP Server 2 Cache Server 1 Cache Server 2

      <?xml version='1.0'?>
      <coherence>
        <cluster-config>








          <unicast-listener>
            <well-known-addresses>
              <socket-address id="1">
                <address>
                  192.168.56.91
               
      </address>
                <port>9300</port>
              </socket-address>
              <socket-address id="2">
                <address>
                  192.168.56.92
               
      </address>
                <port>9300</port>
              </socket-address>
            </well-known-addresses>
            <address>
              192.168.56.91
           
      </address>
            <port>9200</port>
          </unicast-listener>
        </cluster-config>
      </coherence>

      <?xml version='1.0'?>
      <coherence>
        <cluster-config>








          <unicast-listener>
            <well-known-addresses>
              <socket-address id="1">
                <address>
                  192.168.56.91
               
      </address>
                <port>9300</port>
              </socket-address>
              <socket-address id="2">
                <address>
                  192.168.56.92
               
      </address>
                <port>9300</port>
              </socket-address>
            </well-known-addresses>
            <address>
              192.168.56.92
           
      </address>
            <port>9200</port>
          </unicast-listener>
        </cluster-config>
      </coherence>

      <?xml version='1.0'?>
      <coherence>
        <cluster-config>
          <member-identity>
            <cluster-name>
              oep_cluster
            </cluster-name>
            <member-name>
              Grid1
            </member-name>
          </member-identity>
          <unicast-listener>
            <well-known-addresses>
              <socket-address id="1">
                <address>
                  192.168.56.91
               
      </address>
                <port>9300</port>
              </socket-address>
              <socket-address id="2">
                <address>
                  192.168.56.92
               
      </address>
                <port>9300</port>
              </socket-address>
            </well-known-addresses>
            <address>
              192.168.56.91
           
      </address>
            <port>9300</port>
          </unicast-listener>
        </cluster-config>
      </coherence>

      <?xml version='1.0'?>
      <coherence>
        <cluster-config>
          <member-identity>
            <cluster-name>
              oep_cluster
            </cluster-name>
            <member-name>
              Grid2
            </member-name>
          </member-identity>
          <unicast-listener>
            <well-known-addresses>
              <socket-address id="1">
                <address>
                  192.168.56.91
               
      </address>
                <port>9300</port>
              </socket-address>
              <socket-address id="2">
                <address>
                  192.168.56.92
               
      </address>
                <port>9300</port>
              </socket-address>
            </well-known-addresses>
            <address>
              192.168.56.92
           
      </address>
            <port>9300</port>
          </unicast-listener>
        </cluster-config>
      </coherence>

    • Note that the OEP servers do not listen on the WKA addresses, using different port numbers even though they run on the same servers as the cache servers.
    • Also not that the Coherence servers are the ones that listen on the WKA addresses.
  • Now that the configuration is complete we can create a start script for the Coherence grid servers as follows:
    • #!/bin/sh
      MW_HOME=/home/oracle/fmw
      OEP_HOME=${MW_HOME}/ocep_11.1
      JAVA_HOME=${MW_HOME}/jrockit_160_33
      CACHE_SERVER_HOME=${MW_HOME}/user_projects/domains/oep_coherence
      CACHE_SERVER_CLASSPATH=${CACHE_SERVER_HOME}/HADemoCoherence.jar:${CACHE_SERVER_HOME}/config
      COHERENCE_JAR=${OEP_HOME}/modules/com.tangosol.coherence_3.7.1.6.jar
      JAVAEXEC=$JAVA_HOME/bin/java
      # specify the JVM heap size
      MEMORY=512m
      if [[ $1 == '-jmx' ]]; then
          JMXPROPERTIES="-Dcom.sun.management.jmxremote -Dtangosol.coherence.management=all -Dtangosol.coherence.management.remote=true"
          shift
      fi
      JAVA_OPTS="-Xms$MEMORY -Xmx$MEMORY $JMXPROPERTIES"
      $JAVAEXEC -server -showversion $JAVA_OPTS -cp "${CACHE_SERVER_CLASSPATH}:${COHERENCE_JAR}" com.tangosol.net.DefaultCacheServer $1
    • Note that I put the tangosol-coherence-override and the coherence-cache-config.xml files in a config directory and added that directory to my path (CACHE_SERVER_CLASSPATH=${CACHE_SERVER_HOME}/HADemoCoherence.jar:${CACHE_SERVER_HOME}/config) so that Coherence would find the override file.
    • Because my application uses in-cache processing (entry processors) I had to add a jar file containing the required classes for the entry processor to the classpath (CACHE_SERVER_CLASSPATH=${CACHE_SERVER_HOME}/HADemoCoherence.jar:${CACHE_SERVER_HOME}/config).
    • The classpath references the Coherence Jar shipped with OEP to avoid versoin mismatches (COHERENCE_JAR=${OEP_HOME}/modules/com.tangosol.coherence_3.7.1.6.jar).
    • This script is based on the standard cache-server.sh script that ships with standalone Coherence.
    • The –jmx flag can be passed to the script to enable Coherence JMX management beans.

We have now configured Coherence to use an external data grid for its application caches.  When starting we should always start at least one of the grid servers before starting the OEP servers.  This will allow the OEP server to find the grid.  If we do start things in the wrong order then the OEP servers will block waiting for a storage enabled node to start (one of the WKA servers if using Unicast).

Summary

We have now created an OEP cluster that makes use of an external Coherence grid for application caches.  The application has been modified to ensure that the timestamps of arriving events are synchronized and the output events are only output by one of the servers in the cluster.  In event of failure we may get some duplicate events with our configuration (there are configurations that avoid duplicate events) but we will not lose any events.  The final version of the application with full HA capability is shown below:

Files

The following files are available for download:

  • Oracle Event Processing
    • Includes Coherence
  • None-HA version of application
    • Includes test file TestData.csv and Load Test property file HADemoTest.prop
    • Includes Server.properties.Antony file to customize to point to your WLS installation
  • HA version of application
    • Includes test file TestData.csv and Load Test property file HADemoTest.prop
    • Includes Server.properties.Antony file to customize to point to your WLS installation
  • OEP Cluster Files
    • Includes config.xml
    • Includes tangosol-coherence-override.xml
    • Includes Server.properties that will need customizing for your WLS environment
  • Coherence Cluster Files
    • Includes tangosol-coherence-override.xml and coherence-cache-configuration.xml
    • includes cache-server.sh start script
    • Includes HADemoCoherence.jar with required classes for entry processor
References

The following references may be helpful:

Telegram on two devices...

Dietrich Schroff - Tue, 2014-02-25 14:32
After trying Whatsapp on 2 devices without success, i tried the same thing with telegram. Telegram says about multiple devices:


So let's see...
Registering the mobile phone was straight forward: a SMS with a verification code and that's it.
Registering was easy: just another SMS
And after entering the code on the tablet i got the following message on my mobile phone:
After that every conversation or more precise: every unsecure conversation is broadcasted to all of your devices. That means, you can switch over from phone to tablet without loosing your context... really cool!




Pages

Subscribe to Oracle FAQ aggregator