Home » RDBMS Server » Performance Tuning » Please help find the problem
Please help find the problem [message #231768] Wed, 18 April 2007 06:33 Go to next message
vikramjogi
Messages: 71
Registered: June 2005
Location: Scranton
Member
Hello All,

I have a job which used to run 20 mins earlier and all of a sudden started running more than 4 hrs and some times it dosn't complete these days. I am trying to find out the reason behind that. I have enable the trace with waits and binds. I am not able to find what SQL is the culprit or what exactly is the problem. I would greatly appreciate if someone can go thru my TKPROF report and shread some light on where i should be focusing.

Thanks,
Vik
Re: Please help find the problem [message #231781 is a reply to message #231768] Wed, 18 April 2007 07:11 Go to previous messageGo to next message
martijn
Messages: 283
Registered: December 2006
Location: Netherlands
Senior Member
Just a shot in the dark:
How often are your statistics updated? And wich method is used?

Maybe your data changed in such a way that the statistics don't reflect the current situation. Oracle may then choose a non-optimal path to execute it's query's

Re: Please help find the problem [message #231787 is a reply to message #231768] Wed, 18 April 2007 07:29 Go to previous messageGo to next message
vikramjogi
Messages: 71
Registered: June 2005
Location: Scranton
Member
martijn,

Thanks for your quick response.

The statistics are up to date. We gather the stats on a weekly basis. I am using GATHER STALE option.

execute DBMS_STATS.GATHER_SCHEMA_STATS(ownname => 'xxx',estimate_percent => NULL, method_opt => 'for all columns', cascade => true, options => 'GATHER STALE', degree => 4);

Thanks,
Vik
Re: Please help find the problem [message #231801 is a reply to message #231787] Wed, 18 April 2007 07:59 Go to previous messageGo to next message
martijn
Messages: 283
Registered: December 2006
Location: Netherlands
Senior Member
From the documentation:
Quote:

GATHER STALE: Gathers statistics on stale objects as determined by looking at the *_tab_modifications views. Also, return a list of objects found to be stale.



Unfortunately I do not know what exactly stale objects are Sad

Maybe you can re-run the statistics with option "gather"
From the documentation:
Quote:

GATHER: Gathers statistics on all objects in the schema.


Re: Please help find the problem [message #231806 is a reply to message #231801] Wed, 18 April 2007 08:04 Go to previous messageGo to next message
vikramjogi
Messages: 71
Registered: June 2005
Location: Scranton
Member
I will re-run the stats again. Thanks for your input..
Re: Please help find the problem [message #231834 is a reply to message #231806] Wed, 18 April 2007 09:45 Go to previous messageGo to next message
Mahesh Rajendran
Messages: 10672
Registered: March 2002
Location: oracleDocoVille
Senior Member
Account Moderator
Just like add something.
Bucket size plays a very important role with statistics.
Untill you can nail the issues and decide the appropriate size for further maintenance, to be in safer side use these parameters.

method_opt => 'for all indexed columns size 250', cascade => true
Re: Please help find the problem [message #231839 is a reply to message #231768] Wed, 18 April 2007 09:57 Go to previous messageGo to next message
vikramjogi
Messages: 71
Registered: June 2005
Location: Scranton
Member
Mahesh,

Thanks for your suggesion. I would certainly consider adding them. Could you please go thru my attached TKPROF output and let me know where i need to be focusing if you find any culprits. I have taken a look at my output but couln't not figure out any.

Thanks again,
Vik
Re: Please help find the problem [message #231903 is a reply to message #231768] Wed, 18 April 2007 15:41 Go to previous messageGo to next message
dbaxchangedba
Messages: 26
Registered: November 2005
Junior Member
I'll touch on some of the main things.A few things I see based on the tkprof report:

- At a high level, you seem to have these 5 problems :
(1) You seem to be doing a whole lot of hard and soft parsing (2) too many block visits (3) possibility of complex view merging(4) too many roundtrips between the server and the client and (5) excessive dictionary calls.

I'll go individual statement at a time from within your tkprof report:

(1) You insert statement on DOCUMENT_HISTORY is showing 14044 soft parses and 1 hard parse along with the fact that single row processing's being done. Is this insert run as part of a trigger? If it is not part of the trigger then try putting this as part of plsql code and doing bulk binding. Also, if your code is part of plsql then you can try and get rid of soft parsing with the use of session_cached_cursors parameter.
(2) Your insert statement's also showing 184901 current block visits to insert 14045 rows which is almost 13 blocks per insert. Does the DOCUMENT_HISTORY table have a lot of indexes with each index indexed on multiple columns?
(3) Your update on document table is almost the same as your insert in terms of block visits but seem to have 11499 hard parses. Use bind variables. Also, since this table's being accesses based on a PK, I'm assuming that the table data's ordered on the PK values (clustering factor) and so you could potentially order your data based on the docm_id value (asc) and update in the order of docm_id value to avoid block revisits.
(4) Similar to my suggestion for insert, if possible do your update through PLSQL code and with the use of bulk binding.
(5) The select statement that follows your update statement seems to be doing "merge join cartesian" and I think it is due to sub-query un-nesting. Did you recently upgrade to 9.2.0.5 or was the optimizer_features_enable changed to 9.2 for your database? The behaviour with CBO opting for merge join cartesian is due to complex view merging. You can search oracle's metalink for Bug 3444115 to find details and fix. This behavious can potentially be changed through the collection of proper histograms which will help oracle to come up with better cardinality.
(6) IO need to be avoided as much as you can and so looking at the query that follows you update statement, DOCUMENT and DOCUMENT_ELEMENT_VALUE tables are accessed twice and that can be potentially avoided through the use of WITH clause or ranking / row_number functions
(7)Looking at this :

Parse 14045
Execute 14045
Fetch 28090

It looks like (a) statement executions are not done in batch mode and (b) There seem to be 2 roundtrips for every fetch as evidenced by this wait event as well (if you notice, it matches with fetch): SQL*Net message to client 28090
Some of the things you could do : collect all the DOCM_ID values that you want to process and pass it once to the database and return the values once instead of processing 14045 DOCM_IDs individually.
Also, is your process running on the server where database resides or on some client desktop? I see "SQL*Net more data to client" wait which usually suggests that the client is getting too much data from the database than it can digest.

Hopefully you are already using locally managed tablespaces for the database objects and temp areas which can potentially reduce recursive calls.

Good luck!

http://www.dbaxchange.com
Re: Please help find the problem [message #231966 is a reply to message #231903] Wed, 18 April 2007 23:40 Go to previous messageGo to next message
rleishman
Messages: 3724
Registered: October 2005
Location: Melbourne, Australia
Senior Member
Agree with pretty much all of the previous post. I don't know whether that explains why it was once fast and is now slow. I suspect that when it was running fast, it was processing less data.

There are SQLs that are accruing 100s of seconds, but they are being executed thousands of times so the average is quite fast. This is not a clear cut case of a single SQL changing plan. I suspect that none of the SQLs have changed plan, just that you are running them more often.

ie. You have just discovered one of the inexcapable facts of PL/SQL batch processing: SQL within cursor loops is non-scalable.

There is an article here that addresses this exact situation. Sadly, it means you would have to rewrite your routine.

You will get some improvements from following @dbaxchangedba's advice, but so long as you retain the "SQL within SQL" approach, it will never scale well.

Ross Leishman
Re: Please help find the problem [message #232145 is a reply to message #231768] Thu, 19 April 2007 11:13 Go to previous messageGo to next message
vikramjogi
Messages: 71
Registered: June 2005
Location: Scranton
Member
dbaxchangedba,

Thanks for your valuable suggestions made. I would definitely work on them. And to answer your questions

1) You insert statement on DOCUMENT_HISTORY is showing 14044 soft parses and 1 hard parse along with the fact that single row processing's being done. Is this insert run as part of a trigger?

Yes the insert is run as part of the trigger. If an update is done on the DOCUMENT , then the insert is done on the DOCUMENT_HISTORY table.

2)Your insert statement's also showing 184901 current block visits to insert 14045 rows which is almost 13 blocks per insert. Does the DOCUMENT_HISTORY table have a lot of indexes with each index indexed on multiple columns?

I just have one composite index on two columns


3)Your update on document table is almost the same as your insert in terms of block visits but seem to have 11499 hard parses. Use bind variables. Also, since this table's being accesses based on a PK, I'm assuming that the table data's ordered on the PK values (clustering factor) and so you could potentially order your data based on the docm_id value (asc) and update in the order of docm_id value to avoid block revisits.

Do you suggest me to re-build the tale with Index Organized and try?


5) The select statement that follows your update statement seems to be doing "merge join cartesian" and I think it is due to sub-query un-nesting. Did you recently upgrade to 9.2.0.5 or was the optimizer_features_enable changed to 9.2 for your database? The behaviour with CBO opting for merge join cartesian is due to complex view merging. You can search oracle's metalink for Bug 3444115 to find details and fix. This behavious can potentially be changed through the collection of proper histograms which will help oracle to come up with better cardinality.

Yes the database is on 9.2.0.5. Unfortunately i was not able to see the bug 3444115 in metalink. Would be very much interested to know information on this bug.

6)is your process running on the server where database resides or on some client desktop?

Unfotunately, the process is being run from the client desktop


Thanks,
Vik
Re: Please help find the problem [message #232147 is a reply to message #231768] Thu, 19 April 2007 11:17 Go to previous messageGo to next message
vikramjogi
Messages: 71
Registered: June 2005
Location: Scranton
Member
Ross Leishman,

"SQL within SQL", unfortunately that is how my routine works. It is a client side application that runs the routine. I would certainly go thru the link which you provided. Thanks for your suggestion and the link.

Thanks,
Vik



Re: Please help find the problem [message #232187 is a reply to message #232145] Thu, 19 April 2007 14:24 Go to previous messageGo to next message
dbaxchangedba
Messages: 26
Registered: November 2005
Junior Member
"Do you suggest me to re-build the tale with Index Organized and try?"

No, most certainly not. Batch processing should not be done on a row by row basis but instead should be done in batch mode. An ideal design for you would be to have the entire batch processing built using PLSQL code and use the client application on the desktop only to (a) setup data for the batch run and (b) initiate the batch process by calling a package or procedure or have the batch run Asynchronously by making calls to dbms_job. For example, have a batch control table that gets populated by the client application on the desktop with the work unit values that control the batch. Once this information has been populated, run a package or make a call to dbms_job.submit to run a package or procedure that uses the work unit data within the batch control table and this package then does all of the processing in batch mode on the server instead of making repeated back and forth calls between the desktop and the server.
As part of adopting this type of processing, chances are that it will potentially benefit your update statement if you have the DOCM_ID values that will be used for the update stored in a collection that is pre-sorted.

"Yes the database is on 9.2.0.5. Unfortunately i was not able to see the bug 3444115 in metalink. Would be very much interested to know information on this bug"

In your case, it might be a case of sub-query unnesting since I don't see any views involved with the query and cartesian join seem to happen between the indexes of POLICY_PERIOD_DOCUMENT table (outside and inside). If you are in a situation where the query cannot be rewritten and if you've exhausted your options with statistics to have oracle automaically rewrite the query then you could possibly embed /*+ no_unnest */ hint within the subquery. The behaviour can also be changed through a hidden parameter but that's not the way to go.

Good luck.....

http://www.dbaxchange.com
Re: Please help find the problem [message #232469 is a reply to message #231768] Fri, 20 April 2007 12:02 Go to previous messageGo to next message
michael_bialik
Messages: 611
Registered: July 2006
Senior Member
A couple of questions:

Do you , by any chance, have a trigger on DOCUMENT_HISTORY (or any other PLSQL object) that executes the following statement:

SELECT DEV1.DOCM_ELEM_VAL_VAL 
FROM
 DOCUMENT D1, DOCUMENT_ELEMENT_VALUE DEV1, POLICY_PERIOD_DOCUMENT PPD1 
WHERE 
   D1.DOCM_ID = DEV1.DOCM_ID AND D1.DOCM_TYP_ID IN (5000, 4131) AND
   D1.DOCM_ID = DECODE(D1.DOCM_TYP_ID, 5000, D1.DOCM_ID, 4131, :B4) AND 
   DEV1.DOCM_ELEM_TYP_CD =
      DECODE(D1.DOCM_TYP_ID, 5000, 'info_page_src', 4131, 'audt_ctg_frmt_cptl') AND
   D1.DOCM_PRCS_DT IS NULL AND
   D1.DOCM_CREA_DTM = 
      (SELECT MAX(D2.DOCM_CREA_DTM)
       FROM DOCUMENT D2, POLICY_PERIOD_DOCUMENT PPD2 
       WHERE D2.DOCM_TYP_ID = D1.DOCM_TYP_ID AND D2.DOCM_PRCS_DT IS NULL AND 
          PPD2.PLCY_ID = PPD1.PLCY_ID AND
          PPD2.PLCY_PRD_EFF_DT = PPD1.PLCY_PRD_EFF_DT AND
          PPD2.DOCM_ID = D2.DOCM_ID) AND
   PPD1.PLCY_ID = :B3 AND 
   PPD1.PLCY_PRD_EFF_DT = :B2 AND PPD1.DOCM_ID = D1.DOCM_ID AND 
   DECODE(D1.DOCM_TYP_ID, 5000, 'pol_grp_infopg', 4131, 'qa_grp_audit') = :B1
?

IMHO - it's a reason for bad performace.

Are you able to change that statement?

I recommend to rewrite it as 2 separate statements:

IF :B1 = 'pol_grp_infopg' THEN
   SELECT DEV1.DOCM_ELEM_VAL_VAL 
   FROM
    DOCUMENT D1, DOCUMENT_ELEMENT_VALUE DEV1, POLICY_PERIOD_DOCUMENT PPD1 
   WHERE 
     D1.DOCM_ID = DEV1.DOCM_ID AND D1.DOCM_TYP_ID = 5000 AND
     DEV1.DOCM_ELEM_TYP_CD = 'info_page_src' AND
     D1.DOCM_PRCS_DT IS NULL AND
     D1.DOCM_CREA_DTM = 
        (SELECT MAX(D2.DOCM_CREA_DTM)
         FROM DOCUMENT D2, POLICY_PERIOD_DOCUMENT PPD2 
         WHERE D2.DOCM_TYP_ID = D1.DOCM_TYP_ID AND D2.DOCM_PRCS_DT IS NULL AND 
            PPD2.PLCY_ID = PPD1.PLCY_ID AND
            PPD2.PLCY_PRD_EFF_DT = PPD1.PLCY_PRD_EFF_DT AND
            PPD2.DOCM_ID = D2.DOCM_ID) AND
     PPD1.PLCY_ID = :B3 AND 
     PPD1.PLCY_PRD_EFF_DT = :B2 AND PPD1.DOCM_ID = D1.DOCM_I;
ELSIF :B1 = 'qa_grp_audit' THEN
  SELECT DEV1.DOCM_ELEM_VAL_VAL 
  FROM
    DOCUMENT D1, DOCUMENT_ELEMENT_VALUE DEV1, POLICY_PERIOD_DOCUMENT PPD1 
  WHERE 
     D1.DOCM_ID = DEV1.DOCM_ID AND D1.DOCM_TYP_ID = 4131 AND
     D1.DOCM_ID = :B4 AND 
     DEV1.DOCM_ELEM_TYP_CD = 'audt_ctg_frmt_cptl' AND
     D1.DOCM_PRCS_DT IS NULL AND
     D1.DOCM_CREA_DTM = 
      (SELECT MAX(D2.DOCM_CREA_DTM)
       FROM DOCUMENT D2, POLICY_PERIOD_DOCUMENT PPD2 
       WHERE D2.DOCM_TYP_ID = D1.DOCM_TYP_ID AND D2.DOCM_PRCS_DT IS NULL AND 
          PPD2.PLCY_ID = PPD1.PLCY_ID AND
          PPD2.PLCY_PRD_EFF_DT = PPD1.PLCY_PRD_EFF_DT AND
          PPD2.DOCM_ID = D2.DOCM_ID) AND
     PPD1.PLCY_ID = :B3 AND 
     PPD1.PLCY_PRD_EFF_DT = :B2 AND PPD1.DOCM_ID = D1.DOCM_ID
END IF;


The reason is that the following line:
D1.DOCM_ID = DECODE(D1.DOCM_TYP_ID, 5000, D1.DOCM_ID, 4131, :B4)

prevents using index on DOCM_ID columns even when it's possible, because the column appears on BOTH sides of operand.


HTH.
Michael

P.S. I entirely agree that you have an additional problem of parsing (application is NOT using bind variables and trusts cursor_sharing = force to do it's work). You have deal with that too.





[Updated on: Fri, 20 April 2007 12:06]

Report message to a moderator

Re: Please help find the problem [message #232623 is a reply to message #232469] Sun, 22 April 2007 11:59 Go to previous messageGo to next message
dbaxchangedba
Messages: 26
Registered: November 2005
Junior Member
I really don't like the fact that developers tend to use sub-queries that basically go against the same tables that the outer queries access and the only reason they tend to do it is to use grouping functions while the same thing can be done through the use of analytical functions. In the case of this query there's a total of about 8 IOs (tables and index access) and probably 2 or more IOs for temp segments which can be reduced to 5 IOs (tables and index access) and 2 or more IOs for temp areas through the use of analytical function as follows:

Original Query:

SELECT DEV1.DOCM_ELEM_VAL_VAL 
FROM
 DOCUMENT D1, DOCUMENT_ELEMENT_VALUE DEV1, POLICY_PERIOD_DOCUMENT PPD1 
WHERE 
   D1.DOCM_ID = DEV1.DOCM_ID AND D1.DOCM_TYP_ID IN (5000, 4131) AND
   D1.DOCM_ID = DECODE(D1.DOCM_TYP_ID, 5000, D1.DOCM_ID, 4131, :B4) AND 
   DEV1.DOCM_ELEM_TYP_CD =
      DECODE(D1.DOCM_TYP_ID, 5000, 'info_page_src', 4131, 'audt_ctg_frmt_cptl') AND
   D1.DOCM_PRCS_DT IS NULL AND
   D1.DOCM_CREA_DTM = 
      (SELECT MAX(D2.DOCM_CREA_DTM)
       FROM DOCUMENT D2, POLICY_PERIOD_DOCUMENT PPD2 
       WHERE D2.DOCM_TYP_ID = D1.DOCM_TYP_ID AND D2.DOCM_PRCS_DT IS NULL AND 
          PPD2.PLCY_ID = PPD1.PLCY_ID AND
          PPD2.PLCY_PRD_EFF_DT = PPD1.PLCY_PRD_EFF_DT AND
          PPD2.DOCM_ID = D2.DOCM_ID) AND
   PPD1.PLCY_ID = :B3 AND 
   PPD1.PLCY_PRD_EFF_DT = :B2 AND PPD1.DOCM_ID = D1.DOCM_ID AND 
   DECODE(D1.DOCM_TYP_ID, 5000, 'pol_grp_infopg', 4131, 'qa_grp_audit') = :B1


Rewritten as :

SELECT DOCM_ELEM_VAL_VAL
FROM
(
SELECT DEV1.DOCM_ELEM_VAL_VAL,
[B]DENSE_RANK() OVER (PARTITION BY D1.DOCM_TYP_ID,PPD1.PLCY_ID,PPD1.PLCY_PRD_EFF_DT ORDER BY D1.DOCM_CREA_DTM DESC) MAX_COL[/B]
FROM DOCUMENT D1, DOCUMENT_ELEMENT_VALUE DEV1,
POLICY_PERIOD_DOCUMENT PPD1 
WHERE 
   D1.DOCM_ID = DEV1.DOCM_ID AND D1.DOCM_TYP_ID IN (5000, 4131) AND
   D1.DOCM_ID = DECODE(D1.DOCM_TYP_ID, 5000, D1.DOCM_ID, 4131, :B4) AND 
   DEV1.DOCM_ELEM_TYP_CD =
      DECODE(D1.DOCM_TYP_ID, 5000, 'info_page_src', 4131, 'audt_ctg_frmt_cptl') AND
   D1.DOCM_PRCS_DT IS NULL AND
   PPD1.PLCY_ID = :B3 AND 
   PPD1.PLCY_PRD_EFF_DT = :B2 AND PPD1.DOCM_ID = D1.DOCM_ID AND 
   DECODE(D1.DOCM_TYP_ID, 5000, 'pol_grp_infopg', 4131, 'qa_grp_audit') = :B1)
[B]where MAX_COL = 1[/B]


The plan always should be to write optimal queries and one way of writing optimal queries is to minimize IOs through the use available oracle's functionalities.

Good Luck.....

http://www.dbaxchange.com
Re: Please help find the problem [message #232645 is a reply to message #231768] Sun, 22 April 2007 18:48 Go to previous messageGo to next message
dbaxchangedba
Messages: 26
Registered: November 2005
Junior Member
My previous query will not work. It needs a minor modification to support functionality:

Query with changes:

SELECT DOCM_ELEM_VAL_VAL
FROM
(
SELECT DEV1.DOCM_ELEM_VAL_VAL,D1.DOCM_ID,D1.DOCM_TYP_ID,
DENSE_RANK() OVER (PARTITION BY D1.DOCM_TYP_ID,PPD1.PLCY_ID,PPD1.PLCY_PRD_EFF_DT ORDER BY D1.DOCM_CREA_DTM DESC) MAX_COL
FROM DOCUMENT D1, DOCUMENT_ELEMENT_VALUE DEV1,
POLICY_PERIOD_DOCUMENT PPD1 
WHERE 
   D1.DOCM_ID = DEV1.DOCM_ID AND D1.DOCM_TYP_ID IN (5000, 4131)  AND DEV1.DOCM_ELEM_TYP_CD = DECODE(D1.DOCM_TYP_ID, 5000, 'info_page_src',4131, 'audt_ctg_frmt_cptl') AND
D1.DOCM_PRCS_DT IS NULL AND PPD1.PLCY_ID = :B3 AND 
PPD1.PLCY_PRD_EFF_DT = :B2 AND PPD1.DOCM_ID = D1.DOCM_ID AND 
DECODE(D1.DOCM_TYP_ID, 5000, 'pol_grp_infopg',4131, 'qa_grp_audit') = :B1)
where MAX_COL = 1
AND DOCM_ID = DECODE(DOCM_TYP_ID, 5000, DOCM_ID, 4131, :B4)


Good Luck.....

http://www.dbaxchange.com
Re: Please help find the problem [message #234031 is a reply to message #232623] Sun, 29 April 2007 02:27 Go to previous message
rleishman
Messages: 3724
Registered: October 2005
Location: Melbourne, Australia
Senior Member
dbaxchangedba wrote on Mon, 23 April 2007 02:59
I really don't like the fact that developers tend to use sub-queries that basically go against the same tables that the outer queries access and the only reason they tend to do it is to use grouping functions while the same thing can be done through the use of analytical functions.


Back when analytic functions were first released in 8i, I did some benchmarking and discovered that not only were they always slower than an equivalent self-join / sub-query (where possible), but they were also terribly non-scalable.

eg. If a query without an analytic function took 20 seconds, and the same query with an analytic function took 60 seconds, then adding another function with the same OVER clause took more than 100 seconds. ie. The cost of the first function was 40 seconds, the cost of a subsequent function was more, and further functions cost more and more. Using different OVER clauses made it worse, too.

Now I've been doing this too long to suggest that this effect is across-the-board, because I cannot adequately explain the behaviour. It may have been something to do with the database I was using. Also, I have not bothered doing the same benchmarking exercise on later versions of Oracle.

My point is, the number of I/Os can only ever be a guide to the relative merits of two similar queries, but like the optimizer "cost", it is only that: a guide.

The only way to determine the better of two similar queries is to benchmark them under as many different conditions as you can (eg. low load, high load, not much cached, lots cached).

Ross Leishman
Previous Topic: hash partitioning.. help.
Next Topic: help please...
Goto Forum:
  


Current Time: Fri Dec 09 15:40:46 CST 2016

Total time taken to generate the page: 0.29123 seconds