Gaurav Verma

Subscribe to Gaurav Verma feed
Updated: 12 hours 6 min ago

An Application Tuning Case Study: The Deadly Deadlock

Sat, 2008-03-01 01:36

Preface

Lets face it. Deadlock and Latching issues have always been hard to debug. Many a times, we don't focus too well on application tuning, even though, it always advisable to look at the application design and user access trends first. These things don't come to the mind too readily.
Many a times, we do face situations where the users are deadlocked, sitting and watching blue screens or the application was freezing up. While there are many reasons for this to happen: Poor OS implementation, (swapping/disk/Memory issues) , RDBMS bugs, RDBMS initialization parameter values, having a lot of non sharable SQL, a good share of cases can also be caused due to Poor application coding, without undertanding the sequence in which resources should be accessed or acted upon (updated/deleted etc).
This is one such case study in which a deadlock was traced back to a deadlock situation un-aware code and the solution was to enhance the code to not cause deadlocks in the future. While the change was really simple, picturizing the deadlock in action was really difficult.
This article also goes into the mechanics of contructing the deadlock situation in slow motion. This makes it an instructive case study to read through for future reference. Getting the Bad Press...
The client had a big OLTP implementation of Order management (Order to cash business flow) 11i E Business suite and the main bread and butter was to take orders for their configurable products.
Many a times during any business day, the customer service representatives (~200 people) would get blue screens or get hung up, while trying to work on Orders. Getting frantic helpdesk calls for angry customer service reps was normal and obviously this was under the management's scanner since it was highly visible.
Customer service reps would keep making small talk to their clients who would call in to order chairs etc and eventually, the clients would give up on the wait. As a fallback of this, the client was irate (they were losing their customers) and frustrated at implementing order management.
Everybody in customer service would say: Why is Oracle so slow? Scratching the surface...
This continued on for months and it was normal for several myths about bad oracle performance to float around. More so, no one had any idea what was happening. Someone blamed workflow, someone blamed the network and so on..
The customer had done a lot of customizations. Preliminary investigations revealed that often there was a deadlock detected between a concurrent program session and a forms user session.
As a corollary, the custom "Order Acknowledgment" program would sometimes hang inordinately for > half hour, whereas otherwise, it would "normally" finish in 1-2 mins.
Analyzing wait events (using v$session_wait) and 10046 event session tracing didnt lead anywhere. The pattern was not predictable too. It was really very puzzling behaviour. Understanding the dead-lock in slow motion...
The "XX Sales Order Acknowledgement XML publishing" program used to run every minute. There had to be some co-relation between the programs code (table/object access/locking pattern) and the activity being done on Order entry forms.
After analyzing the code and seeing how underlying tables were being accessed or updated, It became clear that there was a problem in the "XX Sales Order Acknowledgement XML publishing" code.
The complete picture of the deadlock pattern was available now. Lets see it in slow motion here: Deadlock Situation 1:
From the entire bunch of qualifying Order header records to be processed for Order Acknowledgements, a batch of 25 Orders were updated at a time. After the updation of 25 orders, a commit was issued, which then freed the locks.
Now, out of these 25 records, if "XX Sales Order Acknowledgement XML publishing" concurrent program (say A) was trying to update the first Order, and there were be a Customer Rep (say B) who was working on the very same order in OE screen, "A" HAD TO WAIT till B freed the record by either commit/rollback action in Order Entry form.
TILL THE TIME B freed the same record, "A" was not be able to proceed with the remaining 24 Orders in the batch and kept on waiting for the row level exclusive lock.
It eventually either timed out or acquired the lock on the row. Similarly, "A" was not be able to proceed till "B" was done with the updates on the Order Header in OEXOETEL (Order Entry) form.
The same applied for any other Order which A had to update. Thus the situation when a single run of "XX Sales Order Acknowledgement XML publishing" concurrent program would be taking 20 mins or so. During this time, one would witness a high Order Entry udpating activity at the Customer Support site.
Now, lets reverse the situation. Deadlock Situation 2:
At anytime, there may be 20 (say) Customer Reps (say "B"), who are working on 25 or 30 orders. Therefore, in theory, around 25-20 Order Entry Headers rows may be locked in exclusive mode.
Now, every next minute, the "XX Sales Order Acknowledgement XML publishing" concurrent program was trying to update these very Order Entry Headers rows and was having a hard time waiting for the row level exclusive locks to get released.
When "B" saved its changes, the commit released the row level exclusive locks. Only then was "A" able to proceed with its job. The real time deadlocks seen during the day were caused by a mix of Deadlock situation 1 and 2.
Both the above situations were happening at the same time and vying for the same resource, the same row in the Order Entry headers table, to become free.
This was a vicious, endless cycle and caused locking grief for people working on Order Entry. Breaking the cycle
Once the nature of the application design and sequence of accessing resources were understood, it was clear that the vicious cycle had to be broken.
It was best to make the custom program more intelligent by checking first if the Order header record was locked (can be done by using the SELECT ..... FOR UPDATE NOWAIT clause).
Any already locked record was skipped for update. The logic used was that the skipped Order header record would be picked up for processing by the custom program next time AFTER the custom service rep would finally commit or rollback on the Order Entry form (the program was running every minute, remember). Summary
The problem was with the custom application design after analyzing the table/object access/locking pattern & understanding the code. Essentially, in this case, what was happening was that the Custom order acknowledgment program used to update a batch of 25 Order header records, in a sequential fashion, and only then issue a commit.
The update statement in culprit concurrent program never checked if the Order header record was already locked or not by another session and hence, as per default behavior, the update statement waited till the lock timeout happened -- this is a long time until NOWAIT clause is specified.
This could happen for any of the 25 Order header records it would try to update. When a customer service rep wanted to update the same Order which was being processed by the order acknowledgment program, the person would hang since the form session couldnt acquire a lock on the Order header record till the time a commit was issued by order acknowledgment program.
On the other hand, when a customer service rep would update an Order through the Order Entry screen, they would lock up an order for a good 10-20 mins or so (they kept doing stuff to it, while talking to the customer on the phone) and the custom order acknowledgment program would keep waiting for that exclusive lock on the order header record.
Again, this could happen for any of those 25 records in the batch, as per the design. So the deadlock was happening both ways.
And it was really subtle. NO amount of tracing or latch analysis or metalink research would have brought the troubleshooter to this stage of understanding the real issue.
Related Links
Other Performance Related Articles

Yet Another Case Study: The over-commit trap!

Sat, 2008-03-01 01:33

Introduction Its always rewarding to talk about lesser known performance tuning scenarios, which are not all that well appreciated in the DBA or developer world. Ask anyone as to what performance tuning has one worked upon and quite likely, you may hear - Ah well, I made sure the stats were gathered latest, the right index was being used or I used a hint to use the right index. How often do we look at other aspects like the frequencey of commits? An atypical performance problem scenario At a particular client site, a custom concurrent progam had a history of taking more than 8-10 hrs to complete.
The Applicaton load nature was a mixture of high OLTP during the regular business day hours and more than moderate batch processing during the evening to early morning hours. Needless to say, a lot of long running batch jobs were contending for shared resources like Rollback & temp segments.
A lot of diagnosis was done as to why the custom concurrent program would take so long. Quite often, the program would terminate with the notorious ORA-1555 (snapshot too old error), even though the data volume involved was quite less (7000 records)  and abundant rollback segment space -- 120g.
Even setting a high rollback segment retention initialization parameter undo_retention  as 8 hours did not help out in ironing out the performance issue. Examining the possibilities... There could have been several reasons why the performance lag was manifesting:

  • The query had a bad plan and the compile time plan was different than runtime plan. This happens many times, but it didnt seem to be the case
  • A hint for using unique index (from using index range scans) to avoid db sequential file read wait event seemed to make no difference
  • The level of  DML activity against tables within the process could be simply too much for the batch load to complete successfully in a suitable time window. This theory did not sound right since rollback retention period ~ 8 hours
  • The program was not doing commit or rollback often enough to be stay within the rollback retention period?
Foxy observations..  While sql  tracing (event 10046) identified that a simple select on a custom table having not so much data was having a hard time, the corresponding SQL plan was using a unique index (!!). What more could one ask for? The situation was almost wierd.
At the same time, pl/sql profiling (dbms_profiler) identified that FETCH  phase of that very simple select was taking long. The HTML output generated from plsql profiler runs using the profiler.sql script of Metalink Note 243755.1 was especially helpful in getting to this stage of analysis to determine where was the maximum delay in the entire process. Getting to the bottom of it  Again, one more thing identified, which could be missed easily, in the tkprof'ed output of raw sql trace was that the CR blocks read for the fetch phase of the cursor were high. The sql engine had to read a huge number of block reads for forming a read consistent image for the FETCH phase. This aspect is not readily discernable since the regular thought process is to look at elapsed time and SQL plan.
What could the reason for such a high  number of block reads for forming a read consistent image ? Too few commits/rollbacks or too much  commits/rollbacks? Overdoing anything is usually not a rewarding experience. Looking closer into the code...  On re-examining the PL/SQL profiler HTML output, it was seen that the program logic was executing a commit statement within a loop for every input record processed.
This would obviously overtax the rollback segments and causing the user server process to record a read consistent image of the corresponding transaction data at every commit.
Taking cue from this observation, when commits were done in a batch fashion at some threshold, the entire load ran in 4 hours (Input volume: 7000 lines). Conclusion

Neither is it good to do commits less frequently, nor is it good to over-commit. Moderation is the key for a happy medium and at times, it can have bearings on performance problems as well.
Who said software cannot learn from Philosophy?

A tuning case study: The goofy optimizer (9i.x RDBMS )

Sat, 2008-03-01 01:23

Preface They say: Tune the application first. And then, many a times, the CBO behaviour is not predictable or favourable, inspite of its best intentions or internal hueristic rules or availability of latest statistics with good sampling percent.
This is one such example and how the usage of hints had to be used as one of the last options to improve the performance of a concurrent program.
Ideally speaking, the CBO chooses what is usually either the best or one of the best available data access plan for executing and fetching the sql output. It should take into considerations all the underlying objects statistics and use the least costly data access plan. Sometimes, it doesnt happen out of the box.
Note: Some seeded 11i Oracle Application modules also depend heavily on hints to make the performance of their code acceptable. One such heavy user is the QP or Advanced Pricing module. The Problem Scenario... At a client site, a shipping related custom concurrent program started ~5 hrs to complete on an average, rather then the usual ~3 hrs. From preliminary investigation using RDBMS trace (event 10046) and PL/SQL dbms profiler, that the maximum time (about ~4.5 hrs) was taken up by the following innocent looking SQL itself!!
SELECT HP.CATEGORY_CODE CATEGORY_CODE ,
       HC.SALES_CHANNEL_CODE  SALES_CHANNEL_CODE ,
       HC.CUSTOMER_CLASS_CODE CUSTOMER_CLASS_CODE ,
       SUM(L.SHIPPED_QUANTITY * L.UNIT_SELLING_PRICE)
         ORDER_SHIP_TOT ,
       L.ACTUAL_SHIPMENT_DATE SHIP_DATE ,
       DECODE (H.ORG_ID, 362, 'US', 365, 'CAD',
                         5624, 'MEX', H.ORG_ID) OPERATING_UNIT ,
       TT.NAME ORDER_TYPE
FROM
  ONT.OE_ORDER_HEADERS_ALL H ,
  APPS.OE_ORDER_LINES_ALL L ,
  APPS.HZ_PARTIES HP ,
  APPS.HZ_CUST_ACCOUNTS HC ,
  APPS.HZ_CUST_SITE_USES_ALL HCSU ,
  APPS.HZ_CUST_ACCT_SITES_ALL HCAS ,
  APPS.OE_ORDER_SOURCES OS ,
  APPS.OE_TRANSACTION_TYPES_TL TT ,
  XXIW.XXIW_BS_OM_ORDER_TYPES XOT
WHERE
     H.INVOICE_TO_ORG_ID = HCSU.SITE_USE_ID(+)
AND H.ORDER_SOURCE_ID = OS.ORDER_SOURCE_ID
AND OS.NAME <> 'Satcom Closed'
AND H.ORDER_TYPE_ID = TT.TRANSACTION_TYPE_ID
AND TT.LANGUAGE = 'US'
AND TT.NAME NOT LIKE 'INTERNAL%'
AND TRUNC(L.ACTUAL_SHIPMENT_DATE)
     BETWEEN NVL(:B2 , TRUNC(L.ACTUAL_SHIPMENT_DATE))
         AND NVL(:B1 , TRUNC(L.ACTUAL_SHIPMENT_DATE))
AND HCSU.CUST_ACCT_SITE_ID = HCAS.CUST_ACCT_SITE_ID(+)
AND HCAS.CUST_ACCOUNT_ID = HC.CUST_ACCOUNT_ID(+)
AND HC.PARTY_ID = HP.PARTY_ID(+)
AND UPPER(HC.CUSTOMER_CLASS_CODE) <> 'INTERNAL'
AND UPPER(HC.SALES_CHANNEL_CODE) <> 'INTERNAL'
AND H.FLOW_STATUS_CODE NOT IN ('CANCELLED','ENTERED')
AND H.BOOKED_DATE IS NOT NULL AND L.FLOW_STATUS_CODE = 'CLOSED'
AND L.LINE_CATEGORY_CODE <>'RETURN'
AND L.HEADER_ID = H.HEADER_ID
AND TT.NAME = XOT.ORDER_TYPE
AND XOT.TYPE = 'ALL'
GROUP BY HP.PARTY_NAME, HP.CATEGORY_CODE, HC.SALES_CHANNEL_CODE,
         HC.CUSTOMER_CLASS_CODE, H.ORDER_NUMBER, H.HEADER_ID,   
         H.BOOKED_DATE, L.ACTUAL_SHIPMENT_DATE,
         H.FLOW_STATUS_CODE, H.ORG_ID, OS.NAME, TT.NAME

The top time taking code section from pl/sql DBMS profiler data. As can be seen, the maximum time is spent in the Fetch of the cursor executing this query:
 Evaluating options The first thing which comes to the mind is: Can we play with different sampling estimations? e.g. 10%, 50%, 70%. While the idea is appealing, it can destablize many other SQL queries and its an uphill task to judge the impact of such a change on the remaining application peformance as such. Heavy price to pay for tuning one lone SQL.
Again, the statistics of the objects were also recent enough (Stats gathered on a weekly basis as usual). There were no big changes in the data volumes as such. Getting into the optimizer's thought process From the SQL plan being used, It can be readily seen that the optimizer acted greedily like this: It saw that there are several small HZ* tables which could easily fit into PGA/SGA  memory and so it thought -- Cool, I can do a quick Hash join of these folks (that would also take care of the outer join predicate evaluation). After that, it proceeded to retrofit the remaining tables to be joined into the plan which was developing.
Fine approach. Only problem is that the index range scan of Header_id column using the non-unique index on the oe_order_lines_all table simply kills the performance of the fetch phase. It is very simply, very very inefficient.
19101503 INDEX RANGE SCAN OE_ORDER_LINES_N1 (object id 334899)
A much better approach would have been to first filter off the header_id values from OE_ORDER_HEADERS_ALL table so that it would use the unique primary key index of OE_ORDER_HEADERS_ALLand join with the OE_ORDER_LINES_ALL table using nested loop operation.
This itself would filter off a huge chunk of the data (which is mostly centered around OE_ORDER_LINES_ALL table). The remaining HZ* & miscellaneous data could easily be plugged in via Hash Joins. The idea was to drive the query using OE_ORDER_HEADERS_ALL to eliminate the maximum chunk of un-required data first.

Also, the predicate AND h.booked_date IS NOT NULL was replaced with
and h.booked_flag = 'Y'
. Why use a NOT NULL clause when a flag is available in OE_ORDER_HEADERS_ALL?

Also, a custom index xxoe_order_lines_n11 was found to exist on oe_order_lines_all.actual_shipment_date column. We thought we'll make use of it too. The more the merrier.

So then, after grouping together the related predicates together, formatting the query and Putting these ideas into practice, the original query was changed as following:
SELECT /*+ index (h,oe_order_headers_u1)
index (l,xxoe_order_lines_n11)
           use_nl(h,l)
       */
      hp.category_code  category_code
     ,hc.sales_channel_code sales_channel_code
     ,hc.customer_class_code customer_class_code
     ,SUM(l.shipped_quantity * l.unit_selling_pric order_ship_tot
     ,l.actual_shipment_date ship_date
     ,DECODE (h.org_id,
        362,  'US',
        365,  'CAD',
        5624, 'MEX',
        h.org_id)  operating_unit
     ,tt.NAME  order_type
   FROM
         ont.oe_order_headers_all  h
        ,apps.oe_order_lines_all   l
        ,apps.hz_parties           hp
     ,apps.hz_cust_accounts        hc
     ,apps.hz_cust_site_uses_all   hcsu
     ,apps.hz_cust_acct_sites_all  hcas
     ,apps.oe_order_sources        os
     ,apps.oe_transaction_types_tl tt
     ,xxiw.xxiw_bs_om_order_types  xot
   WHERE
          h.header_id                    = l.header_id
      AND h.ORDER_SOURCE_ID              = os.ORDER_SOURCE_ID
      AND h.ORDER_TYPE_ID                = tt.TRANSACTION_TYPE_ID
      AND h.flow_status_code not in ('CANCELLED','ENTERED')
      -- AND h.booked_date IS NOT NULL
      -- Replaced with the following condition                 
      and h.booked_flag = 'Y'
      -- Replaced with the above condition
  
      AND tt.LANGUAGE                    = 'US'
      AND tt.NAME                 not like 'INTERNAL%'
      AND l.flow_status_code             = 'CLOSED'
      AND l.LINE_CATEGORY_CODE          <>'RETURN'
      AND TRUNC(L.ACTUAL_SHIPMENT_DATE)
          BETWEEN NVL(:B2 , TRUNC(L.ACTUAL_SHIPMENT_DATE))
              AND NVL(:B1 , TRUNC(L.ACTUAL_SHIPMENT_DATE))
      AND h.invoice_to_org_id             = hcsu.site_use_id (+)
      AND hcsu.cust_acct_site_id = hcas.cust_acct_site_id(+)
      AND hcas.cust_account_id           = hc.cust_account_id(+)
      AND hc.party_id                    = hp.party_id(+)
      AND UPPER(hc.customer_class_code) <> 'INTERNAL'
      AND UPPER(hc.sales_channel_code)  <> 'INTERNAL'
      and tt.name                        = xot.order_type
      and xot.type                       = 'ALL'
      AND os.name                       <> 'Satcom Closed'
   GROUP BY
      hp.party_name,
      hp.category_code,
      hc.sales_channel_code,
      hc.customer_class_code,
      h.order_number,
      h.header_id,
      h.booked_date,
      l.actual_shipment_date,
      h.flow_status_code,
      h.org_id,
      os.name,
      tt.name;

The Improvement

The same query now executed under 5 minutes!!! ~4.5 hrs to ~5 mins. Thats quantum.

The improved plan was as follows:

Conclusion

While it may not be ideal, at times, it may be required to supplement of coach the CBO to achieve acceptable levels of application performance. The effect may be more long lasting if the "relative" data volume pattern of the underlying objects is not expected to change drastically in the system.

Case study: How to leverage +LargePages option of J2SE1.5 on Linux hugemem kernel for 11i Oracle Applications java servlet based modules

Sat, 2008-03-01 00:54

Preface

This case study deals with Oracle configurator product (shortcode CZ) of 11i Oracle Applications and how to make it more stable and scalable by leveraging the LargePages option provided in J2SE1.5. This paper is in continuation to earlier article Common Incorrect beliefs about process address limits on 32bit platforms
It is assumed that the maintenance release of 11i Oracle Applications is 11.5.10 as per certification specified in Note 304099.1 - Using J2SE Version 5.0 with Oracle E-Business Suite 11i. To find the latest version of Java certified with the E-Business Suite application tier, please see OracleMetaLink Note 300482.1, titled "Overview of Using Java with Oracle E-Business Suite Release 11i".
This study assumes RHAS3 as the running platform OS.

The general approach and the case for this study

To increase stability and response time of a java servlet based application (say Oracle configurator), at least these two conditions must be satistifed:

  • Garbage collection must be minimized - The more the overall heapsize, the lesser will be the contention for space across different sessions, assuming the user load remains the same. The lesser the demand for object or session allocation in the JVM heap, the lesser will be the garbage collection.
  • Response time must be reasonably high - This, again is dependent upon garbage collection. During full garbage collection (GC) cycles, the application is not able to respond to the user activity because its doing background maintenance. Therefore, the lesser the garbage collection, the higher is the response time.
There might also be a requirement for doing a considerable caching of configurator models in the JVM heap, which adds to the memory overhead. Using LargePage option can be useful in such a situation. Installing the hugemem kernel

Based on the article Redhat Linux Kernels and process space limits using Java.. Some eye-openers , it is already established that the hugemem kernel allows to address upto 3.4G of process address space.

Install the hugemem kernel rpm for the same linux kernel level on the configurator tier. As an example For 2.4.21-40 level, the kernel-hugemem-2.4.21-40.EL.i686.rpm is the corresponding rpm.

As root:

# rpm -Uvh kernel-hugemem-2.4.21-40.EL.i686.rpm

Since you would be persisting with the hugemem kernel across boots, make Hugemem the default kernel by specifying the default=0 line in the Boot loader configuration file. Assuming the boot Loader is GRUB, the file would be /etc/grub.conf. default= specifies the order of the kernel listed in loader configuration file, starting with 0 as the first. It might be set to different value(1,2,3) depending on however many kernels installed on the machine.


Note: Installing the hugemem kernel from the rpm does put the hugemem kernel as the first entry in the boot loader configuration file (/etc/grub.conf), therefore default=0 will be needed to be specified.


#more /etc/grub.conf


# grub.conf generated by anaconda


#


# Note that you do not have to rerun grub after making changes


# to this file


# NOTICE: You have a /boot partition. This means that


# all kernel and initrd paths are relative to /boot/, eg.


# root (hd0,0)


# kernel /vmlinuz-version ro root=/dev/rootvg/lv1


# initrd /initrd-version.img


#boot=/dev/sda


default=0


timeout=10


splashimage=(hd0,0)/grub/splash.xpm.gz


title Red Hat Enterprise Linux ES (2.4.21-40.ELhugemem)


root (hd0,0)


kernel /vmlinuz-2.4.21-40.ELhugemem ro root=/dev/rootvg/lv1 hda=ide-scsi


initrd /initrd-2.4.21-40.ELhugemem.img


.....


.....


Create a symbolic link for System.map in /boot to point to System.map file specific to hugemem kernel.

#pwd
/boot
# ls -l System.map
lrwxrwxrwx 1 root root 30 May 24 20:55 System.map -> System.map-2.4.21-40.ELhugemem

This is a also good time to install the jdk-1_5_0_06-linux-i586.rpm (J2SE(TM) Development Kit 5.0 Update 6 from http://java.sun.com).


As root:

# rpm -Uvh jdk-1_5_0_06-linux-i586.rpm

Warn the Operating System that the limits will be challenged:


  • Add the following in /etc/security/limits.conf

applmgr soft nofile 65536

applmgr hard nofile 65536


applmgr soft nproc 16384


applmgr hard nproc 16384


applmgr soft stack unlimited


applmgr hard stack unlimited


applmgr soft memlock unlimited


applmgr hard memlock unlimited


  • Set the ulimit -a for the unix user running the JVM  like this:

# ulimit -a

time(cpu-seconds) unlimited


file(blocks) unlimited


coredump(blocks) 0


data(kbytes) unlimited


stack(kbytes) unlimited


lockedmem(kbytes) unlimited


memory(kbytes) unlimited


nofiles(descriptors) 65536


processes 16384

Implement hugepage & supporting features in OS

Make kernel configuration changes for supporting Hugepages and shared memory segments. The example assumes that you would be locking 9GB of real memory on the machine with 12G of RAM.


  • Add this the /etc/sysctl.conf :

##############################################################

# Added for huge page access


#


# This is worth of 9GB locked in Real Memory


# This limit needs to be changed as per the available memory

# on the Box with 12g RAM. Change this as per your

# requirement

#


##############################################################


vm.hugetlb_pool=9216


##############################################################


# Added to keep maxsize of shared memory segment at 3.5g


#


# This is required to support LargePages feature of JDK1.5

# and to avoid the following message at JVM startup:

# Java HotSpot(TM) Server VM warning: Failed to reserve

# shared memory (errno = 22).

##############################################################


kernel.shmmax=3758096384


kernel.shmall=3758096384


##############################################################


  • Mount hugepage filesystem as root:

# mkdir -p /mnt/hugepages


  • The following entries should be there in /etc/fstab:

# vi /etc/fstab

..


..


##############################################################


# These entries are needed to make hugemem work


##############################################################


none /dev/shm tmpfs defaults,size=9G 0 0


##############################################################


# In the above setting, size should be modified as real


# memory being locked in sysctl.conf for vm.hugetlb_pool


##############################################################


nodev /mnt/hugepages hugetlbfs rw 0 0


##############################################################


  • Reboot the machine to boot into Hugemem Kernel and for allocating hugepages to reflect in /proc/meminfo. After the reboot, it should look like this:

$ grep -i huge /proc/meminfo

HugePages_Total: 4608


HugePages_Free: 4608


Hugepagesize: 2048 kB


Note: For dual core CPU Dell machines, it was found that even though Hugepagesize was 2mb, the number pages to be specified in /etc/sysctl.conf had to be doubled to make it worth a particular memory limit, say 9G. You are advised to experiment & verify with the output of previous command on your machine to check if this holds true for you too.



  • Check that the hugepagefs and tmpfs filesystems have been mounted with the required options:

# mount | grep huge

nodev on /mnt/hugepages type hugetlbfs (rw)


# mount | grep tmpfs


none on /dev/shm type tmpfs (rw,size=9G)

Prepare the Java servlet application for leveraging the LargePage JVM option


$ whoami


applmgr


$ vi $IAS_ORACLE_HOME/Apache/Apache/bin/java.sh


# ###############################################################


#


# This file is automatically generated by AutoConfig. It will be


# read and overwritten. If you were instructed to edit this file,


# or if you are not able to use the settings created by


# AutoConfig, refer to Metalink document 165195.1 for assistance.


#


# ###############################################################


# Added ulimit -s 8192 to make jdk1.5 work with modified native


# thread stack size ONLY for this servlet. This will keep the


# effect local.


#################################################################


ulimit -s 8192


JSERVJAVA=/usr/java/jdk1.5.0_06/jre/bin/java


#################################################################


ARGV="$*"


JVMLOGDIR=/ORACLE/prod2/9iAS/Apache/Jserv/logs/jvm


STDOUTLOG=$JVMLOGDIR/jserv.stdout


STDERRLOG=$JVMLOGDIR/jserv.stderr


...


...


...


#################################################################


  • In $IAS_ORACLE_HOME/Apache/Jserv/etc/jserv.properties, add the JVM switches directive line in the # Begin customizations & # End customizations block at the very end of the file. Remember to put that all these JVM switches in the same line:


# vi $IAS_ORACLE_HOME/Apache/Jserv/etc/jserv.properties


...


...


...


# Begin Customizations


################################################################


# Added to make JVM1.5 work with Oracle Configurator


################################################################


wrapper.bin=/ORACLE/prod3/9iAS/Apache/Apache/bin/java1.5.sh


################################################################


# the max heapsize specified by -Xmx will have to be changed as


# per the max available RAM On the config machine e.g. if the


# RAM=12g and we are reserving 9g for JVM heaps, then each JVM


# can have a max Heapsize of 2.5g or so:


# (2.7 g + 300g -> for perm generation overhead ) x 3 JVMs ~ 9g


# minus spare


################################################################


wrapper.bin.parameters=-verbose:gc -Xmx2750M -Xms2750m -Xss128k -XX:PermSize=128m -XX:MaxPermSize=256M -XX:NewRatio=3 -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:+CMSParallelRemarkEnabled -XX:ParallelGCThreads=4


-XX:+PrintGCTimeStamps -XX:+UseTLAB -XX:+UseLargePages -server


################################################################


# End Customizations


Note: It is important to keep these directives in # Begin & End Customization section so as to preserve these settings across AutoConfig sessions.


  • You will need to comment out the following 2 lines in the starting section of $IAS_ORACLE_HOME/Apache/Jserv/etc/jserv.properties :


#######################


# Execution parameters


#######################


# The Java Virtual Machine interpreter.


# Syntax: wrapper.bin=[filename] (String)


# Note: specify a full path if the interpreter is not visible in


# your path.


# wrapper.bin=/ORACLE/prod2/9iAS/Apache/Apache/bin/java.sh


# Arguments passed to Java interpreter (optional)


# Syntax: wrapper.bin.parameters=[parameters] (String)


# Default: NONE


################################################################


# comment out this line so that the java servlet will at least


# startup


################################################################


# wrapper.bin.parameters=-verbose:gc -Xmx1024M -Xms128M -XX:MaxPermSize=128M -XX:NewRatio=2 -XX:+PrintGCTimeStamps -


XX:+UseTLAB


################################################################


Note: It should be remembered that the default wrapper.bin directive for java.sh and default JVM options will get uncommented after running Autoconfig. So you would have to remember to comment them again after running Autoconfig, or else the Apache servlet will not start.

Why dont we do it this way?
One good question would be: why dont we put the JVM switches in the autoconfig token jvm_options in $CONTEXT_FILE? This would alleviate the need for manually updating the $IAS_ORACLE_HOME/Apache/Jserv/etc/jserv.properties each time after AutoConfig. For example.


<jvm_options oa_var="s_jvm_options" osd="Linux">-verbose:gc -Xmx1024M -Xms128M -XX:MaxPermSize=128M -XX:NewRatio=2 -XX:+PrintGCTimeStamps -XX:+UseTLAB</jvm_options>


The answer is simple. Setting the JVM options here will cause EACH servlet setup in the configurator tier to use the same switches. Assume that there are 2 more servlets configured to be running on the same machine (iPayment and OXTA servlets), the total memory taken up the servlets will be 3 X 3g ~ 9G. If you were planning on running 3 confugurator JVMs with LargePage option, all your real memory is gone already.


Hence the manual change step. At the same time, if your IAS_ORACLE_HOME is setup in such a way through the *.properties file that ONLY the configurator servlet is supposed to startup, then this method would work as well. So, this is also a viable option & probably a better one in this situation.

Reap the benefits
Now when you start the confgurator servlet using $SCRIPTS/adapcctl.sh, you can easily verify that the servlet JVM process has 3G of heapsize allocated and that the Garbage collection time spent is considerably reduced after using the parallel GC JVM options used above.


........


........


45990.174: [ParNew 1640767K->1624634K(2457536K), 0.0106000 secs]


46718.218: [ParNew 1640888K->1624758K(2457536K), 0.0133860 secs]


46913.196: [ParNew 1641014K->1628789K(2457536K), 0.0291830 secs]


........


.........


The advantage of using largepages is that the heap memory will be locked and is not eligible for paging out on swap. The access of objects in heap in real memory is obviously faster.


Hence, using Largepages option is a good option  for achieving performance goals, especially when the caching demands in the servlet are considerable and there is a section of RAM on the machine running servlet, which is not being put to any use.

Related articles and links

HugePages on Linux: What It Is... and What It Is Not... Metalink Note:361323.1


Bigpages vs. Hugetlb on RedHat Linux                            Metalink Note:261889.1


Java support for Large Memory Pages


Java Tuning White Paper

Redhat Linux Kernels and process space limits using Java.. Some eye-openers

Sat, 2008-03-01 00:49

Different Kernels available and their memory limits using Java There are at least 3 different kind of Kernels available in Redhat linux:
1) SMP kernel:- This kernel is the most common variant used and supports only upto 4GB Virtual Memmory ((2**32 = 4GB) linear addresses). It gives an address space of 3G virtual memory peruser process &amp; 1G for kernel, but after deducing stack overhead, the maximum user addressable space limit comes to 1.7g.
The split is:
1.0 GB Text/code
1.7 GB available process memory for mmap, malloc / address space
     (e.g. usable by shared Pool)
0.3 GB Stack
1.0 GB Kernel
2) Enterprise Kernel:- In this kernel, the process address space can go up to 64GB VM (2**36 = 64GB linear addresses). This feature is called Page Address Extension (PAE) and is also referred to as Very Large Memory (VLM) support.
The enterprise kernel provides VLM support out of the box, BUT, it requires relinking of the kernel, which is a pretty unpopular option with Redhat and usually last resort of system administrators.
Note: You should be aware that Oracle Corp. will stop supporting any Oracle
          installation on a recompiled kernel, particularly Red Hat.
3) Hugemem Kernel:- Red Hat Enterprise Linux 3.0/4.0 includes a new kernel known as the hugemem kernel. This kernel supports a 4GB per process user space (versus 3GB for the other kernels), and a 4GB direct kernel space.
  The hugemem kernel is usually required in order to use all the memory in system configurations containing more than 16GB of memory. The hugemem kernel can also benefit configurations running with less memory (if running an application that could benefit from the larger per process user space).
The virtual address space is split as:
1.0 GB Text/code
2.7 GB available process memory for mmap, malloc / address
        space (e.g. usable by shared Pool)
0.3 GB Stack
4.0 GB Kernel Understanding how one kernel is better than other... It is important to understand the pros and cons between using SMP, Enterprise and Hugemem kernels. For applications having large virtual memory requirements, even the enterprise kernel could be a solution using the VLM feature in RHEL3/4, but it would entail creating a very large buffercache, using options mentioned in Note 262004.1 for Oracle Databases (one kind of application).
On top of the page addressing problem for 64GB linear addresses, there will be a problem in kernel memory if we have a lot of memory. It would be required to build a Page Translation Extension (PTE) for all available memory pages and this restricts the memory we can address.
Typically the kernel memory of 1gb is completely used up if its required to address just 24GB of memory. To overcome this problem we need a large kernel memory and this is where the hugemem kernel beats the Enterprise kernel as it provides 4G of Kernel memory. Experimenting with process limits across different factors..

We can find out the maximum number of threads which can be spawned on a platform across any JVM version by running a program written in Java (ThreadTest.java) -- listing available in Appendix A.
This program just keeps on spawning threads and keeps them alive almost forever. When it reaches the Java.Lang.OutOfMemory error on spawning a thread, it outputs the maximum number of threads it was able to spawn.
Based on this information and the native java thread stack size specified on JVM startup and using the golden memory rule, we can calculate the total virtual memory address space of the JVM process.
For a listing of the java source code of and the unix shell script check_thread_limits.sh, please refer to Appendix B-1. The results of the max threads spawned and Maximum user address space by varying JDK version, Kernel (SMP, Hugemem), ulimit, stacksize are listed in Appendix B-2.
As can be seen from Appendix B-2:
For SMP kernels:
With ulimit -s unlimited:
(Total size of managed heap + native heap) + (number of threads X thread stack size) ~ 1.7G
With ulimit -s 10m or 8m or 6m or 4m or 2m:
(Total size of managed heap + native heap) + (number of threads X thread stack size) ~ 2.7G
For Hugemem kernels:
With ulimit -s unlimited:
(Total size of managed heap + native heap) + (number of threads X thread stack size) ~ 2.5G
With ulimit -s 10m or 8m or 6m or 4m or 2m:
(Total size of managed heap + native heap) + (number of threads X thread stack size) ~ 3.7G
Note: This also implies that the "unlimited" native stack size constant is not the same as 10m as might be seen in comments in /usr/src/linux/include/sched.h for _STK_LIM constant, rather its some valule which is very high as compared to 10m. Sometime, seeing and reading comments in source code files is not believing, as I found out.


Appendix A : Listing of ThreadTest.java

public class ThreadTest {

public static void main(String[] pArgs) throws Exception {

try {
// keep spawning new threads forever
      while (true) {
      new TestThread().start();
      }
   }
// when out of memory error is reached, print out the number of
// successful threads spawned and exit
   catch ( OutOfMemoryError e ) {
     System.out.println(TestThread.CREATE_COUNT);
     System.exit(-1);
   }
  }
  static class TestThread extends Thread {
    private static int CREATE_COUNT = 0;
    public TestThread() {
     CREATE_COUNT++;
    }
// make the thread wait for eternity after being spawned
    public void run() {
      try {
        sleep(Integer.MAX_VALUE);
      }
// even if there is an interruption, dont do anything
      catch (InterruptedException e) {
      }
    }
  }
}

Appendix B-1 : Listing of check_thread_limits.sh

Appendix B-2 : Results of check_threads_limits.sh across different Kernel, Java Version and native Java stack size

SMP Kernel results:

Hugemem Kernel results:

Common Incorrect beliefs about process address limits on 32bit platforms

Sat, 2008-03-01 00:08

Preface

Most people believe that one cannot go beyond 1.2g-1.7g or so as a maximum heapsize for Java servlet based applications deployed with 9iAS/10gAS. The ignorant would cite this as a limitation of 32bit Intel architecture.

In reality, this assumption is only partially true as it applies only to SMP Kernels and that too with the default -unlimited native stack size per process. People might say "Your results may vary" but the truth is that there are several factors like process address space & stack limits, kernel limits, shared memory support etc which need to be understood to make an educated comment on the question of  - What is the best limit I can push the JVM heap to for my application and How can I do it  without compromising stability ?.

Kernel Limitations and known issues of JDK1.4 Heaps on 32-bit platform vis-a-vis 64-bit platforms

While it is generally believed that the maximum memory addressable space for a process on 32-bit platforms is 1.7g , this is only true for the SMP kernel.

Right? Well, only if unlimited(usually the default, given by $ ulimit -s command ) native stack size is used. If any lesser value like 10m, 8m, 6m, 4, 2m are used, the user address space increases to 2.7g.

There are white papers on metalink having details how the SGA (implemented as OS shared memory area) of an oracle database on 32-bit platform can be pushed to 3.5g. This is achieved by lowering the starting address point in the memory address stack for the oracle executable and relinking it, so that the memory addressable space increases, when a process from the oracle executable is spawned.

For JVMs, the application specific golden limit has really to be found out by trial and error by tweaking ulimit -s and some other factors.

The easy command for finding the maximum heapsize limit can be found by raising the value of -Xmx and -Xms switches at jvm startup:

# e.g. if you want to see if a heap of 1400m can be allocated and committed at startup

$ java -version -Xmx1400m -Xms1400m 

If this doesnt fail and you get the version of java in the output, keep pushing the OS with higher heap sizes and see when the segmentation fault Signal 11 errors start coming. You have reached the forbidden limit and would do well to keep ~300-400m off this limit for accomodating native stack space for java threads. The native stack space requiement would vary as per concurrent user load of the application.

The golden memory rule:
The golden rule of thumb for max address space per process to be kept in mind is:


(Total size of managed heap + native heap) + (number of threads X thread stack size) = max process address space allowed by Kernel


Viewing it in a more wholistic way,


Data segment (=Total size of managed heap + native heap) + stack segment(= # of threads X thread stack size) = max user address space allowed by Kernel

More possibilities with JDK1.5

There are some exciting things which can be done using JDK1.5, like addressing 3G & more of heapsize & using LargePages feature, using which its possible to eliminate heavy swapping, iowait & Garbage collection, since the heap is kept in real memory.

It should be noted that JDK1.5 is only supported at this time with Oracle Applications 11.5.10. Its not yet certified yet with 11.5.9 mostly due to logistical reasons and for avoiding regression testing efforts with JDK1.6, which Oracle plans supposedly plans on doing.

Related Links and Articles

Case study: How to leverage LargePages option of J2SE1.5 on Linux hugemem kernel for 11i Oracle Apps

Misleading adcfgclone.pl ERROR: OUI files not found at their expected location...please make sure that the pre-requisite OUI patch was applied correctly...

Thu, 2008-02-14 21:04

If you ever got this ERROR while adcfgclone.pl, then it would do you good to check if you have existing directories named in the context file.

As misleading the below error message is, if you debug the perl
scripts and check the code a base path is not being able to be
constructed by adcdfclone.pl.

To make a long story short, if you just
double check all the directory paths that have been mentioned in the
context file, you can just get around this problem easily.



This misleading error message is fixed in R12.

Here is an example of such an instance..

        perl /apsd01/apps/apsdcomn/clone/bin/adclone.pl
java=/apsd01/apps/apsdcomn/clone/bin/../jre mode=apply stage=/apsd01/apps/apsdcomn/clone/bin/.. component=appsTier method=CUSTOM appctxtg=/apsd01/apps/apsdappl/admin/APSD_apsd.xml  showProgress contextValidated=false

ERROR: OUI files not found at their expected location.
Setting OUI path to: /appsoui

ERROR: OUI files not found at their expected location. Please make sure that the pre-requisite OUI patch was applied correctly: patch should be unzipped directly at the ORACLE_HOME root level so that OraInstaller.jar exists in either
1/oui/jlib/
or
1/oui/lib/

Restart adclone.pl after performing required corrections.
ERROR: Failed to execute /apsd01/apps/apsdcomn/clone/bin/adclone.pl


Java memory leak in XML Parser 2.0.2.9 leads to thousands of workflow notification emails with ORA-29532: Java call terminated by uncaught Java exception: java.lang.NullPointerException

Thu, 2008-01-17 07:26

IntroductionIn a particular customer's production instance (11.5.9, post ATG RUP 5), the WF administrator email was being bombarded with thousands of emails like these:

==================================================
An Error occurred in the following Workflow.

Item Type = XITEMIFS
Item Key = 982893-5825-38602875
User Key =

Error Name = ECX_PROGRAM_EXIT_EMBD
Error Message = 6012: XML Gateway process aborted : 'ORA-29532: Java
call terminated by uncaught Java exception: java.lang.OutOfMemoryError -
ecx_utils.INITIALIZE: '. For further details, see the log corresponding to
FND-Logging AFLOG MODULE Name: ''.
Error Stack =
Wf_Engine.GetItemAttrEvent(XITEMIFS, 982893-5825-38602875,
ECX_EVENT_MESSAGE)
ECX_STANDARD.sendDirectCover(XXINV, SYNC_ITEM)
ECX_STANDARD.Send(XITEMIFS, 982893-5825-38602875, 113178, RUN)
Wf_Engine_Util.Function_Call(ECX_STANDARD.SEND, XITEMIFS,
982893-5825-38602875, 113178, RUN)

Activity Id = 113178
Activity Label = XXINV_RECEIVE_SYNC_EVENT:DOCUMENT_SEND_XMLGATEWAY
Result Code = #EXCEPTION
Notification Id =
Assigned User =

Workflow monitor

Action History

=========================

An Error occurred in the following Workflow.

Item Type = XITEMIFS
Item Key = 986029-5704-38881957
User Key =

Error Name = ECX_PROGRAM_EXIT_EMBD
Error Message = 6012: XML Gateway process aborted : 'ORA-29532: Java call terminated by uncaught Java exception: java.lang.NullPointerException - ecx_utils.INITIALIZE: '. For further details, see the log corresponding to FND-Logging AFLOG MODULE Name: ''.
Error Stack =
Wf_Engine.GetItemAttrEvent(XITEMIFS, 986029-5704-38881957, ECX_EVENT_MESSAGE)
ECX_STANDARD.sendDirectCover(XXINV, SYNC_ITEM)
ECX_STANDARD.Send(XITEMIFS, 986029-5704-38881957, 113178, RUN)
Wf_Engine_Util.Function_Call(ECX_STANDARD.SEND, XITEMIFS, 986029-5704-38881957, 113178, RUN)

Activity Id = 113178
Activity Label = XXINV_RECEIVE_SYNC_EVENT:DOCUMENT_SEND_XMLGATEWAY
Result Code = #EXCEPTION
Notification Id =
Assigned User =

==================================================
Another variation...
An Error occurred in the following Workflow.

Item Type = XITEMIFS
Item Key = 982893-5825-38602875
User Key =

Error Name = ECX_PROGRAM_EXIT_EMBD
Error Message = 6012: XML Gateway process aborted : 'ORA-29532: Java call
terminated by uncaught Java exception: java.lang.OutOfMemoryError -
ecx_utils.INITIALIZE: '. For further details, see the log corresponding to
FND-Logging AFLOG MODULE Name: ''.
Error Stack =
Wf_Engine.GetItemAttrEvent(XITEMIFS, 982893-5825-38602875,
ECX_EVENT_MESSAGE)
ECX_STANDARD.sendDirectCover(XXINV, SYNC_ITEM)
ECX_STANDARD.Send(XITEMIFS, 982893-5825-38602875, 113178, RUN)
Wf_Engine_Util.Function_Call(ECX_STANDARD.SEND, XITEMIFS,
982893-5825-38602875, 113178, RUN)

Activity Id = 113178
Activity Label = XXINV_RECEIVE_SYNC_EVENT:DOCUMENT_SEND_XMLGATEWAY
Result Code = #EXCEPTION
Notification Id =
Assigned User =

Workflow monitor

Action History
==================================================
Initial investigation...Initial investigation revealed that maybe it was beneficial to apply patch 5194357 - REHOST XML PARSER FOR PL/SQL v1.0.2+ for Oracle EBS 11i  (as per Metalink note 356141.1). Patch 5356822 (PATCH FOR XMLGATEWAY MEMORY ISSUES) was also mentioned, but it was not clear if it would solve the problem.

After studying bugs, another patch come up: 5209280 (APPST: ORA-29532: JAVA CALL TERMINATED BY JAVA EXCEPTION:JAVA.LANG.OUTOFMEMORYER). The product mentioned in the patch information is CORE, which can easily be confused with CORE RDBMS product e.g. 9i, 8i etc. The CORE was actually referring to the CORE component of XDK xmlparser product. Apparently, the XDK XML parser was broken in version 2.0.2.9 and fixed in 8.1.7.2.99.

Later on, we came to understand that patch 5194357 was a APPS rehosting of core patch 5209280.
rehosting ?
"rehosting" is an interesting patch terminology. Essentially, it means that a patch for a core XML sub-component was re-certified to be applied in an Oracle Applications 11i environment.

For example, While running patch.sh supplied in patch 5209280, it failed to copy files to $ORACLE_HOME/xdk/plsql/parser/bin . Also since load.sql which was not present in $ORACLE_HOME/xdk/plsql/parser/bin, as mentioned in the README, it could not be executed. It was clear that patch 5209280 made a lot of assumptions about the environment where it was to be executed.

Thats where patch 5194357 (REHOST XML PARSER FOR PL/SQL v1.0.2+ for Oracle EBS 11i) was useful; it was fully compliant to be run with adpatch utility. It was the same fix, but delivered to be applied in an 11i environment.
WorkaroundThe workaround was to bounce the Workflow Agent Listener Service GSM container from Oracle Applications Manager GUI.

SolutionAfter applying patch 5194357 (REHOST XML PARSER FOR PL/SQL v1.0.2+ for Oracle EBS 11i) and doing heavy regression testing (marking status of SENT or ERRORed notifications in WF_NOTIFICATIONS as MAIL), the memory stamina of the Workflow Agent Listener Service GSM container increased considerably.

Essentially, there was a memory leak introduced in the previous XML parser used by the pl/sql sessions of Workflow Agent Listener Service.


11i Post cloning issues: APPSLOCALLOGIN.JSP gives JAVA.LANG.NOCLASSDEFFOUNDERROR - how is it related to JTF Distributed caching..

Tue, 2008-01-15 05:10

Preface

This situation was faced after cloning: in spite of repeated apache bouncing, we could not make the self service login page to work, in a multi-tier(load balanced) application architecture for 11.5.10 instance. A generic error like java.lang.NoClassDefFoundError was coming up.

Many a times, it can be resolved by making sure that your CLASSPATH or AF_CLASSPATH has all the right .zip/.jar files/$JAVA_TOP etc in $IAS_ORACLE_HOME/Apache/Jserv/etc or adovars.env, BUT at times even those checks did not work.

Debugging and trying all sorts of things first..Well, we did not come to the understanding of the actual problem till much later. Initially, we tried many things like recompiling JSP cache in $COMMON_TOP/_pages.

sandbox:web_dev> perl ojspCompile.pl --compile --fast
identifying apache_top.../ORACLE/dev/9iAS
identifying apache_config_top.../ORACLE/dev/9iAS
identifying java_home.../usr/java/j2sdk1.4.2_07
identifying jsp_dir.../ORACLE/apps/dev/common/html
identifying pages_dir.../ORACLE/apps/dev/common
identifying classpath...file:///ORACLE/dev/9iAS/Apache/Jserv/etc/jserv.properties
auto-logging to /ORACLE/apps/dev/common/_pages/ojsp_error.log
starting...(compiling delta)
using 8i internal ojsp ver: 1.1.3.5.2
including compatibility flag -whiteSpaceBetweenScriptlet
synchronizing dependency file:
  enumerating jsps...14016
  parsing jsp...14016
  writing deplist...14016
initializing compilation:
  eliminating children...10724 (-3292)
  searching uncompiled...10692
translating and compiling:
  searching untranslated...10692
  translating jsps...10692/10692 [failed: 204] in 3m58s             
  compiling jsps...10488/10488 [failed: 75] in 5m44s               
Finished!


Now, when we tried accessing the AppsLocalLogin.jsp page, the following contents could be seen in the _pages directory and the error was still very much present:

sandbox:web_dev> pwd
/ORACLE/apps/dev/common/_pages/_oa__html


sandbox:web_dev> ls -l *Apps*
-rw-r--r--    1 applmgr  oaa         11356 Nov 14 09:23 _AppsChangePassword.class
-rw-r--r--    1 applmgr  oaa         11494 Nov 14 09:23 _AppsChangePassword$__jsp_StaticText.class
-rw-r--r--    1 applmgr  oaa          2037 Nov 14 09:23 _AppsCustomizedSSOLogin.class
-rw-r--r--    1 applmgr  oaa           852 Nov 14 09:23 _AppsCustomizedSSOLogin$__jsp_StaticText.class
-rw-r--r--    1 applmgr  oaa          4626 Nov 14 09:23 _AppsLocalLogout.class
-rw-r--r--    1 applmgr  oaa           888 Nov 14 09:23 _AppsLocalLogout$__jsp_StaticText.class
-rw-r--r--    1 applmgr  oaa         25964 Nov 14 09:28 _qotSZzpAppsLink.class
-rw-r--r--    1 applmgr  oaa          3714 Nov 14 09:28 _qotSZzpAppsLink$__jsp_StaticText.class

Since the .class files were very much present, it meant that the .jsp files had been converted into runtime .class files.
Snippets from relevant log filesmod_jerv.log

[12/11/2007 14:33:53:558] (ERROR) ajp12: Servlet Error: OracleJSP: java.io.FileNotFoundException
[12/11/2007 14:34:57:972] (ERROR) ajp12: Servlet Error: OracleJSP: java.io.FileNotFoundException
[12/11/2007 14:39:43:301] (ERROR) ajp12: Servlet Error: OracleJSP: oracle.jsp.provider.JspCompileException: &lt;H3>Errors compiling:/ORACLE/apps/dev/common/_pages/_oa__html//_ahlBrowserTitleIncl.java&lt;/H3>&lt;TABLE BORDER=1 WIDTH=100%>&lt;TR>&lt;TH ALIGN=CENTER>Line #&lt;/TH>&lt;TH ALIGN=CENTER>Error&lt;/TH>&lt;/TR>&lt;TR>&lt;TD WIDTH=7% VALIGN=TOP>&lt;P ALIGN=CENTER>63&lt;/TD>&lt;TD>[jsp src:line #:26]&lt;br> cannot resolve symbol symbol  : class FrameworkException  location: class _oa__html._ahlBrowserTitleIncl         catch(FrameworkException fe)               &lt;/TD>&lt;/TR>&lt;TR>&lt;TD WIDTH=7% VALIGN=TOP>&lt;P ALIGN=CENTER>68&lt;/TD>&lt;TD>[jsp src:line #:31]&lt;br> cannot resolve symbol symbol  : variable csiPageContext  location: class _oa__html._ahlBrowserTitleIncl         csiPageContext.setAppId(AhlcmAppConst.APP_ID);         &lt;/TD>&lt;/TR>&lt;TR>&lt;TD WIDTH=7% VALIGN=TOP>&lt;P ALIGN=CENTER>69&lt;/TD>&lt;TD>[jsp src:line #:32]&lt;br> cannot resolve symbol symbol  : variable csiPageContext  location: class _oa__html._ahlBrowserTitleIncl         Hashtable browserTitlePrmpts = CsifutRegion.getPrompts(csiPageContext, "AHL_BROWSER_TITLES");                                                                &lt;/TD>&lt;/TR>&lt;TR>&lt;TD WIDTH=7% VALIGN=TOP>&lt;P ALIGN=CENTER>69&lt;/TD>&lt;TD>[jsp src:line #:32]&lt;br> cannot resolve symbol symbol  : variable CsifutRegion  location: class _oa__html._ahlBrowserTitleIncl         Hashtable browserTitlePrmpts = CsifutRegion.getPrompts(csiPageContext, "AHL_BROWSER_TITLES");                                        &lt;/TD>&lt;/TR>&lt;TR>&lt;TD WIDTH=7% VALIGN=TOP>&lt;P ALIGN=CENTER>73&lt;/TD>&lt;TD>[jsp src:line #:36]&lt;br> cannot resolve symbol symbol  : variable AHL_BROWSER_TITLE_PRMT  location: class _oa__html._ahlBrowserTitleIncl           browserTitle = (String)browserTitlePrmpts.get(AHL_BROWSER_TITLE_PRMT);                                                         &lt;/TD>&lt;/TR>&lt;/TABLE>
[12/11/2007 14:39:43:301] (ERROR) an error returned handling request via protocol "ajpv12"


OACoreGroup.0.stderr

Exception in static block of jtf.cache.CacheManager. Stack trace is: oracle.apps.jtf.base.resources.FrameworkException: IAS Cache initialization failed. The Distributed Caching System failed to initialize on port: 12362. The list of hosts in the distributed caching system is: sandbox1 sandbox2 . The port 12362 should be free on each host running the JVMs. The default port 12362 can be overridden using -Doracle.apps.jtf.cache.IASCacheProvidercacheProvider.port=&lt;port number>
        at oracle.apps.jtf.cache.IASCacheProvider.init(IASCacheProvider.java:220)
        at oracle.apps.jtf.cache.CacheManager.activateCache(CacheManager.java:1444)
        at oracle.apps.jtf.cache.CacheManager.initCache(CacheManager.java:752)
        at oracle.apps.jtf.cache.CacheManager.&lt;clinit>(CacheManager.java:378)
        at oracle.apps.fnd.cache.Cache.setCacheFullName(Cache.java:228)
        at oracle.apps.fnd.cache.Cache.initCache(Cache.java:114)
        at oracle.apps.fnd.cache.Cache.&lt;init>(Cache.java:89)
        at oracle.apps.fnd.cache.AppsCache.&lt;init>(AppsCache.java:86)
        at oracle.apps.fnd.cache.AolCaches.getCache(AolCaches.java:155)
        at oracle.apps.fnd.profiles.Profiles.&lt;clinit>(Profiles.java:241)
        at oracle.apps.fnd.profiles.ExtendedProfileStore.init(ExtendedProfileStore.java:498)
        at oracle.apps.fnd.profiles.ExtendedProfileStore.&lt;init>(ExtendedProfileStore.java:119)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:274)
        at oracle.apps.fnd.common.AppsContext.instantiateProfileStore(AppsContext.java:4312)
        at oracle.apps.fnd.common.AppsContext.makeProfileStore(AppsContext.java:1171)
        at oracle.apps.fnd.common.Context.setProfileStore(Context.java:970)
        at oracle.apps.fnd.common.Context.setProfileStore(Context.java:952)
        at oracle.apps.fnd.common.AppsContext.initializeContext(AppsContext.java:958)
        at oracle.apps.fnd.common.AppsContext.initializeContext(AppsContext.java:918)
        at oracle.apps.fnd.common.AppsContext.&lt;init>(AppsContext.java:752)
        at oracle.apps.mwa.container.ApplicationsObjectLibrary.AOLInit(ApplicationsObjectLibrary.java:172)
        at oracle.apps.mwa.container.ApplicationsObjectLibrary.&lt;init>(ApplicationsObjectLibrary.java:76)
        at oracle.apps.mwa.container.MWALib.setObjectLibrary(MWALib.java:333)
        at oracle.apps.mwa.wap.engine.WapServlet.init(WapServlet.java:158)
        at org.apache.jserv.JServServletManager.load_init(JServServletManager.java:755)
        at org.apache.jserv.JServServletManager.loadServlet(JServServletManager.java:659)
        at org.apache.jserv.JServServletManager.loadStartupServlets(JServServletManager.java:789)
        at org.apache.jserv.JServServletManager.init(JServServletManager.java:447)
        at org.apache.jserv.JServ.start(JServ.java:625)
        at org.apache.jserv.JServ.main(JServ.java:234)
.....
.....
java.lang.NoClassDefFoundError
        at org.apache.jserv.JServServletManager.load_init(JServServletManager.java:765)
        at org.apache.jserv.JServServletManager.loadServlet(JServServletManager.java:659)
        at org.apache.jserv.JServConnection.processRequest(JServConnection.java:394)
        at org.apache.jserv.JServConnection.run(JServConnection.java:294)
        at java.lang.Thread.run(Thread.java:534)
.....

.....

java.lang.NoClassDefFoundError
        at oracle.apps.fnd.framework.webui.OAPageContextImpl.getMACValidationLevel(OAPageContextImpl.java)
        at oracle.apps.fnd.framework.webui.OAPageBean.isMacEnabled(OAPageBean.java:7254)
        at oracle.apps.fnd.framework.webui.OAPageBean.preparePage(OAPageBean.java:667)
        at oracle.apps.fnd.framework.webui.OAPageBean.preparePage(OAPageBean.java:508)
        at oracle.apps.fnd.framework.webui.OAPageBean.preparePage(OAPageBean.java:429)
        at _oa__html._OA._jspService(_OA.java:85)
        at oracle.jsp.runtime.HttpJsp.service(HttpJsp.java:119)
        at oracle.jsp.app.JspApplication.dispatchRequest(JspApplication.java:417)
        at oracle.jsp.JspServlet.doDispatch(JspServlet.java:267)
        at oracle.jsp.JspServlet.internalService(JspServlet.java:186)
        at oracle.jsp.JspServlet.service(JspServlet.java:156)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:588)
        at org.apache.jserv.JServConnection.processRequest(JServConnection.java:456)
        at org.apache.jserv.JServConnection.run(JServConnection.java:294)
        at java.lang.Thread.run(Thread.java:534)



javacache.log

sandbox:web_dev> pwd

/ORACLE/apps/dev/common/rgf/dev_sandbox1

....
....
[2007-11-14 Wed 11:59:31.099 AM EST] -001 [Grp] Transport initialization failed.
java.lang.IllegalArgumentException: Port value out of range: 144957
        at java.net.ServerSocket.&lt;init>(ServerSocket.java:180)
        at oracle.ias.cache.group.Transport.createServerSocket(Unknown Source)
        at oracle.ias.cache.group.Transport.ssinit(Unknown Source)
        at oracle.ias.cache.group.Transport.init(Unknown Source)
        at oracle.ias.cache.group.GroupCommunication.init(Unknown Source)
        at oracle.ias.cache.Net.init(Unknown Source)
        at oracle.ias.cache.CacheInternal.init(Unknown Source)
        at oracle.ias.cache.Cache.init(Unknown Source)
        at oracle.apps.jtf.cache.IASCacheProvider.init(IASCacheProvider.java:199)
        at oracle.apps.jtf.cache.CacheManager.activateCache(CacheManager.java:1444)
        at oracle.apps.jtf.cache.CacheManager.initCache(CacheManager.java:752)
        at oracle.apps.jtf.cache.CacheManager.&lt;clinit>(CacheManager.java:378)
        at oracle.apps.fnd.cache.Cache.setCacheFullName(Cache.java:228)
        at oracle.apps.fnd.cache.Cache.initCache(Cache.java:114)
        at oracle.apps.fnd.cache.Cache.&lt;init>(Cache.java:89)
        at oracle.apps.fnd.cache.AppsCache.&lt;init>(AppsCache.java:86)
        at oracle.apps.fnd.cache.AolCaches.getCache(AolCaches.java:155)
        at oracle.apps.fnd.profiles.Profiles.&lt;clinit>(Profiles.java:241)
        at oracle.apps.fnd.profiles.ExtendedProfileStore.init(ExtendedProfileStore.java:498)
        at oracle.apps.fnd.profiles.ExtendedProfileStore.&lt;init>(ExtendedProfileStore.java:119)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:274)
        at oracle.apps.fnd.common.AppsContext.instantiateProfileStore(AppsContext.java:4312)
        at oracle.apps.fnd.common.AppsContext.makeProfileStore(AppsContext.java:1171)
        at oracle.apps.fnd.common.Context.setProfileStore(Context.java:970)
        at oracle.apps.fnd.common.Context.setProfileStore(Context.java:952)
        at oracle.apps.fnd.common.AppsContext.initializeContext(AppsContext.java:958)


....

....

So what is Distributed JTF cache? Thats when distributed JTF cache feature comes in, which seems to be enabled by default in 11.5.10 (not sure as of which ATG patchset though). Please my previous article What is the Java cache feature: Framework, Features, Administration and
How does it tie up with Workflow Business Event System (BES)


Essentially, it is just one kind of Java cache component for the JTF product and has a direct bearing on the compilation of the AppsLocalLogin.jsp page. If some FND profile options were changed from a non-java interface like Forms or sqlplus, AND the Workflow Java Deferred Agent Listener GSM component is NOT running, then there is a high possibility that the JTF cache has been invalidated.

Working around it...Thankfully, this is when we came across other bugs that talked about disabling the distributed JTF java cache by modifying the following parameters in the $IAS_ORACLE_HOME/Apache/Jserv/etc/jserv.properties and bouncing Apache:

###############################################
# By default the value of DLONG_RUNNING_JVM is true
###############################################
wrapper.bin.parameters=-DLONG_RUNNING_JVM=false

###############################################
# You may have to add this parameter
###############################################
wrapper.bin.parameters=-DCACHEMODE=LOCAL


And this is when it all the features of Java caching started to make sense. Again, for more information and background, I would highly advise you to read the following two articles:

What is the Java cache feature: Framework, Features, Administration and How does it tie up with Workflow Business Event System (BES)

Java Caching for Oracle Applications 11i: Part 1

ConclusionJava caching can be a pain in the neck, if we do not understand how it works. This can really cause grief in post cloning situations, since FND profile options have to changed to suit to the new environement. In the above case, the issue could also have been resolved by making sure the pending events of Workflow Java Deferred Agent Listener GSM component were processed.

The catch was to understand the Java caching framework and how it was connected with the Workflow Business Event System (BES) events.




What is the Java cache feature: Framework, Features, Administration and How does it tie up with Workflow Business Event System (BES)

Tue, 2008-01-15 03:03

Preface
In a fast paced world, we all want to cache things for faster access: Lets keep this here, so that I wont have to go around the world to get that. Well, this concept was just applied to the JVM components in the application server realm as well.

While we already have an excellent
article by Mike Shaw on Steven Chan's million dollar blog:
http://blogs.oracle.com/schan/2007/05/01/, I thought I would expound upon the java caching feature in general and talk about how it works, it is configured and administered. Also, the relationship between java caching and Workflow Business event system is not made very clear in the existing articles.

Disclaimer:

Most of the content in this article has been taken from the  Oracle Applications Java Caching Framework Developer's Guide, so I am not going to lay any great claims to its originality. At the same time, I will just say that I have tried to present the information there in a more digestible format for Application DBAs like me, who want to get upto speed with the various features being spewed out by the FND team.
Distributed Java Cache - what framework does it provide...

The Caching Framework provides following features:



  • Synchronization: Caching Framework takes care of all
    the synchronization issues in a multi-threaded environment.



  • Distributed Caching: Distributed caching allows one JVM
    to inform all the other JVMs about updates to any cached data. This ensures
    consistency of the cached data across all the JVMS.



  • Data Partition: This feature allows the cached data to
    be partitioned based on any partition key. This is useful for JVMS in hosting
    environments running against a Virtual Private Database (VPD). The Caching
    Framework ensures that the cache data is partitioned based on security group
    id.



  • Database Invalidation: This feature ensures that when
    the data in a database is updated, the corresponding cached data in the
    mid-tier is also updated. This is useful when the updates to the database
    bypass the JVMS, such as updates through forms or sqlplus.



  • Event Support: This feature allows custom event handlers
    to be invoked when certain events such as invalidation or updates to the
    cached data occur. This is useful for maintaining consistency across inter-dependent
    cached data.



  • Cache Administration: Caching Framework comes with administration User
    interface, which is available under the Functional Administrator responsibility. This interface
    can be used to perform administrative operations including changing the time-out values for
    cache components, looking at cache usage statistics, and clearing caches.


  • Cache Diagnostics: Cache diagnostics is a set of diagnostic tests that can
    identify some of the common problems that may occur when using the Caching Framework.

Another good representation of the communication between components is:

How does it work?By default, the distributed caching works by sending the
invalidation message to other JVMs. The list of machines is maintained in the
database. As each JVM starts, it makes an entry into this list (if it is not
already present). When the other JVMs receive this invalidation message, they
mark the object invalid if it is present. When the object is requested in the
other JVMs, it is loaded from the database at that time.

DB Invalidation provides a general mechanism to originate
events from the database. For Caching Framework, these events are used to
invalidate objects in a component cache.

Distributed caching ensures that these updates are seen in all the
middle-tiers. However, in some cases, these updates can happen outside of the middle-tier such as,
updates through the Forms server or concurrent programs. In such cases, the DB invalidation
mechanism can be used to ensure that the cached data is invalidated when the corresponding
database data changes. 


The following is a typical process flow for database invalidation:


  1. The user JVM (Apache/ Jserv) caches data such as profiles. The cached data is a
    representation of some profiles in the database.

  2. The user performs an action such as, updating a profile value through Jinitiator, which
    causes the data in the database to be updated while passing the user JVM.

  3. The cached profile value must now be marked as invalid. Subsequent access to the profile in
    the user JVM causes a cache miss, which in turn results in the updated value being loaded from the
    database.

How Oracle workflow business event system (BES) is related to this.....The Workflow Business Event system provides necessary infrastructure support for notification
of the database data change to the middle-tier. The notification is originated by raising a
business event with the correct key. Caching Framework provides the support for processing the
notification and invalidating the corresponding cached data in all the user JVM.

It is important to know that:
  • Every time an event is raised, it is processed in a separate thread.
  • The events are processed by the Java Deferred Agent Listener running in a standalone JVM (GSM)
    started by a concurrent process. This JVM must be running all the time. This JVM sends a
    distributed invalidation message causing the corresponding key in all of the JVMs to be marked
    invalid.
Lets see an example of processing a WF BES event:





Configuring DB Invalidation Feature

This feature is built using the business event support provided by Oracle Workflow,
the apps caching infrastructure and the underlying distributed caching functionality
provided by IAS cache.


When an update to the database data happens, a workflow event is raised. The
Workflow JVM running the Java Deferred Agent Listener processes this event. This
JVM then issues a distributed invalidation message to all the other JVM's running
against the same database and the cached data in the JVM is invalidated. The following
configuration is necessary for this feature:



  1. Caching Framework must be running in the distributed mode. This is
    the default configuration of Caching Framework. The
    -DLONG_RUNNING_JVM=true is set and it ensures that Caching Framework
    runs in distributed mode. For backward compatibility the same can be ensured by setting
    -DCACHEMODE=DISTRIBUTED.


  2. The Workflow Java Deferred Agent Listener must be running. This runs as
    a thread in a special workflow service container JVM and processes the business
    events being raised in the database. This should be running by
    default. To verify, make sure that Oracle Applications Manager (OAM) is running,
    login into SSA and perform the following steps:
    1. Select the System Administrator responsibility.
    2. Select Workflow (OAM).
    3. Select on the icon next to Agent Listeners.
    4. Query for Workflow Java Deferred
      Agent Listener by selecting Name
      from the Filter dropdown list.

      Note: If the Status column
      shows Running then skip the following steps.


    5. Select the Workflow Agent Listener Service link (under
      Container).
    6. Select Workflow Agent Listener Service and if the status is
      not running, select Start from the dropdown list. Then select
      Go.

      Note: Make sure the status changes to the green icon
      representing a running state.


    7. Return to the Workflow Java Deferred Agent Listener page. Select Start from the dropdown list (under Actions) and then select Go.

      The Status column should show Running.



How to check if a component is enabled for cachingFollow the steps below to check whether the distributed flag is set for a particular
component cache:
  1. Login into the HTML Application Administrator Console (sysadmin/ sysadmin).
  2. Select the Performance tab and then select Components, which is
    located on left side navigation bar.
  3. Choose the correct application from the View dropdown list and then select the appropriate link under Component Identifier.
  4. Verify the Distributed Mode checkbox is
    checked. To enable distributed caching for the component cache, this box needs
    to be checked.
  5. If you change the setting, the Apache/ Jserv must be restarted for the setting
    to take effect.
Cache Administration

These java -D parameters must be added to make the java caching work. For an
Apache/Jserv these are specified in the jserv.properties file as:


wrapper.bin.parameters=-D&lt;name>=&lt;value>

The list of parameters is as below:


  • -DAPPLRGF=&lt;a writeable log directory>

    The writeable log directory can be the Jserv log directory. A file with
    the name javacache.log is created under this directory. If it is not specified, the
    java current working directory is the default. If the default directory
    is not writeable, an error message is written to the jserv log file without
    any adverse effect on the functionality.



  • -DLONG_RUNNING_JVM=true

Note: The above cache deployment steps are required
only if you are using either the distributed caching feature or the cache event handling
feature.
If your environment is auto-config enabled apply the tech stack rollup patch H 3416234.
Testing and TroubleshootingTesting the Component Cache

To troubleshoot Caching Framework use the Diagnostic
Framework. The diagnostic test for Caching Framework can be accessed from
the Oracle Diagnostic UI:


http://&lt;hostname:port>/OA_HTML/jtfqalgn.htm


To log into the system use sysadmin/ sysadmin. Alternatively, the same can
be accessed by logging into AppsLocalLogin as sysadmin, selecting either
CRM ETF Administration or CRM HTML Administration responsibility, and then
selecting the Diagnostics link under Setup.


There are two types of diagnostic tests:



  1. Basic tests to verify the basic configuration related with Caching Framework:CRM Foundation from the
    Application dropdown list, and then select
    Caching Framework on the left side. Select
    Run Without Prerequisite to run the test. Select the icon under the
    Report

    These tests do not require any user inputs and are more suitable for the system administrator
    to run. To access, select the Basic tab, choose column to see the results. This test prints out the basic
    information about Caching Framework's configuration and component caches. This test is more applicable to Apps DBAs like us.


  2. Advanced tests that can be used to troubleshoot a specific component
    cache:
    CRM Foundation
    from the ApplicationCache Component Diagnostic Test on the left side. You can run tests
    on individual component caches.
    To access, select the Advanced tab, select dropdown list, and then select
Testing Database Invalidation

This section discusses how to:


    * Perform database invalidation tests.
    * Bounce the workflow JVM.
    * Verify the workflow event is getting raised.
    * Verify workflow event processing.
    * Verify Java object cache log file location.
Performing Database Invalidation Tests

To test your implemenation, If you have implemented database invalidation:


  1. Perform an action that causes updates to the database data while bypassing the user JVM
    (typically Apache/ Jserv), such as updating a profile through Jinitiator.


    To test, the user must update the data in the database through a UI or directly. Also,
    the user must ensure that an event is raised when the update occurs.


    Note: The event can be raised using the PL/SQL API or through the
    Workflow Administrator Web Application UI.


  2. Access the corresponding cached object. If database invalidation works correctly, you will see
    the updated value.

    To test, the value of the data in the user JVM must be accessed. It should be the new updated
    value. Alternatively, you can examine the keys in the cache through the Advanced Diagnostics test.
    The key that corresponds to the data that is getting updated in step 1 above, must not be
    present in the cache after the update occurs.


    Note: Because the processing of events is asynchronous, you must
    wait at least one or two minutes after updating the data to see the effect in the middle-tier data.



For database invalidation to work correctly, all the different underlying pieces must work as
expected. Some of these pieces belong to the infrastructure and some are supplied by the user. All
the seed data must be in place, the background services must be running, and the runtime
must behave as expected. Diagnostic tests are provided for checking the configuration. The
Cache DB Invalidation test, under the Basic diagnostics tab, can be run
to verify that the general infrastructure configuration is in place.


The CacheComponent DB Event Invalidation test, under the Advanced diagnostics tab, can be run for
a specific component cache for which the database invalidation functionality is being tested. The
poplist lists only the component caches that have corresponding workflow defined. If your
component cache does not appear, check the component cache definition to make sure that the business
event is defined.


If all the diagnostic tests pass but the functionality still does not work, check the runtime
behavior by:

  • Bouncing the workflow JVM.
  • Verifying the workflow event is getting raised.
  • Verifying the workflow event is processed correctly.
  • Verifying the Java object cache log file location.

Bouncing the Workflow JVM

After applying the appropriate patch, the workflow JVM that is running the Java Deferred Agent
Listener, needs to be bounced.


Note: This is not the Apache Jserv process.


To bounce the workflow JVM:



  1. Login to SSA and select the System Administrator responsibility.



  2. Select Workflow (under Oracle Applications Manager (OAM).



  3. Select the Agent Listeners icon and locate the row where the
    Name column is Workflow Java Deferred Agent Listener.


  4. Select the Workflow Agent Listener Service under the
    Container column.



  5. Verify that the radio button for the Workflow Agent Listener Service row under the
    Select column is selected.


  6. Select Restart from the second dropdown list and then select
    Go.


  7. Select OK. The State column changes
    to Restarting.


  8. After a short delay, select Reload. The
    State column changes to Activated.



Verifying the Workflow Event is Getting Raised

To verify that the workflow event is getting raised:


  1. Issue the following query:


    select to_char(enq_time, 'yyyy-mm-dd hh24:mi:ss')
    ,to_char(deq_time,'yyyy-mm-dd hh24:mi:ss')
    ,msg_state,user_data from applsys.aq$wf_java_deferred where enq_time > to_date
    ('2004-11-1 11:19:00','yyyy-mm-dd hh24:mi:ss') order by enq_time desc;


    Notes:

    • The to_date(..) value should not be more than a few
      seconds greater than the time 't'.



    • This query should return at least one row where the string representation of
      user_data value contains text of the form
      ('BES_EVENT_NAME', 100, 'oracle.apps.fnd.menu.entry.insert') or
      ('BES_EVENT_NAME', 100, 'oracle.apps.fnd.menu.entry.update').



    • For profile update it would be
      ('BES_EVENT_NAME', 100, 'oracle.apps.fnd.profile.value.update').



    • The third value, oracle.apps.fnd.profile.value.update,
      is the 'event name'. It should also have text of the form
      AQ$_JMS_USERPROPERTY('BES_EVENT_KEY', 100,
      '10001:0:0:FND_SSO_LOCAL_LOGIN_MASK',...)
      where the third value should correspond to the data
      that got updated/ created.



    • The value '10001:0:0:FND_SSO_LOCAL_LOGIN_MASK' is the
      'event key'.




  2. The value of the msg_state column begins as Ready and then
    it changes to Processed.

    Note: Attempt to raise an
    event through the Workflow Administrator Web Application UI by selecting
    Events, searching for the desired event, and then using the icon under
    the Test column in the Search result.



Verifying Workflow Event Processing

Event processing can be confirmed by examining the log files. This requires changing the log
level of the workflow JVM that is running the Java Deferred Agent Listener, which processes the
event.


To change the log level:


  1. Login to Oracle Applications Manager Dashboard -> Site map -> workflow -> Service component.

  2. Locate the row where the Name column is
    Workflow Java Deferred Agent Listener.

  3. Select the checkbox under the Select column, which is located
    next to the row mentioned in the previous step.

  4. Select Edit.

  5. Select Next to go to a second page. Select Procedure from the
    Log Level dropdown list.


    Note: No changes are required if the
    Log Level value is Procedure or Statement.

  6. Select Finish.

  7. Restart the Workflow Agent Listener service.


To examine the runtime logs:



  1. Cause a database invalidation event by performing an action to update the data. See performing database invalidations test.

  2. Login to Oracle Applications Manager Dashboard -> Site map -> workflow -> Service component.

  3. Locate the row where the Name column is
    Workflow Java Deferred Agent Listener.

  4. Select View Log.

    The log file contains the following
    series of messages:


    • Business Event received in the cache handler:&lt;event name> with the workflow
      context:&lt;context value>.
    • Business Event=&lt;event name>key=&lt;event key>corresponds to the
      app=&lt;Application Short Name>and component key=&lt;Cache Component Key>and loader
      class=&lt;loader class name>.
    • Just about to call stringToKey with key=&lt;event key>.
    • Obtained the keys &lt;keys>.
    • Invoking
      CacheManager.invalidate for component=,Cache Component Key>
      app=&lt;Application Short Name> and key=&lt;list of keys>.

      Note: This message should appear at least once.


Interesting related information on the metalinkLooks like there have been more metalink notes released for diagnosing issues in the recent past: e.g. Investigating NoClassDefFoundError in eBusiness 11i when users login (Metalink Note 455366.1) and Diagnosing database invalidation issues with Java Cache for eBusiness Suite (Metalink Note 455194.1).


Pages