Oracle function myth on Solaris machinie

From: Limin Guo <guo_at_andrews.edu>
Date: 21 Nov 2003 07:44:49 -0800
Message-ID: <f866cfef.0311210744.57cb73b0@posting.google.com>

I apologize for posting to both news groups and hope someone from either one has encountered the similar problem before …

My reporting production database (8.1.7.2) running on a Solaris7 machine. Couple of days ago, one of developers found out his insert statement (with 235,000 records) took 2 hour and 15 minutes on the production database while the identical insert on a reporting test database (9.2.0.4) just took less than 10 minutes. What he did was just a simple insert into a table by select * from a view. Inside the view, a function (getting top stack from a table) was called twice in the select clause.

To find out what is going on, I ran the identical insert on production. Based on the growth rate of used_rec on v$transaction, I could estimate my total insert time. Sadly enough, the estimated insert time was always over two hours. I then decided to trace the sql statement and see where the time was spent. Surprisingly, in the connection where event 10046 level 12 enabled in session level, the insert finished within 10 minutes (fast mode). After this, I started a new sqlplus connection to the production and issued "alter session set sql_trace=true", the insert took 10 minutes to finish as well. Then the third sqlplus connection estiblished without sql_trace/10046 enabled and ran the insert again, the estimated time back to over 2 hours (slow mode). WHY sql_trace or 10046 has impact on the insert, I don't understand.

I know you got to ask me about execution plan. The execution plans on the main select in the view and the function did not get changed at all regardless of FAST or SLOW mode.

The myth was not end up here. Since I could not figure out what caused the problem, I bounced the production database during the day (it should only shut down once a week at normal backup schedule on Friday). You know what, immediately after the database bounce, the insert went to SLOW mode regardless if I used sql_trace/10046 in the session level or not. On the other word, the impact of sql_trace or 10046 on the insert went away. I gave it up... Oddly enough, the day after, the insert ran in FAST mode and has been like that ever since.

I looked over the raw trace files generated for both "FAST" and "SLOW" modes, I found, in SLOW mode, each function fetch took 2ms and the total number of execution and fetch is over 470,000 times (twice as much as records inserted into the table since two columns in the view uses same function). And in FAST mode, the fetch for each function call almost doesn't take any time at all. I KNOW IT IS THIS FACTION, BUT I WANT TO UNDERSTAND WHY. here is a piece of trace file for SLOW mode.

BINDS #2:
bind 0: dty=1 mxl=32(07) mal=00 scl=00 pre=00 oacflg=03 oacfl2=1 size=32 offset
=0

bfp=01f952a8 bln=32 avl=07 flg=05
value="8700300"
EXEC #2:c=0,e=0,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=4,tim=3847519354 FETCH #2:c=2,e=2,p=0,cr=28,cu=4,mis=0,r=1,dep=1,og=4,tim=3847519356

The machine has 10 processors and 14G memory and hosts multiple production databases. Last machine reboots was Nov. 5. For last couple of weeks, DBAs has been asked to terminate a lot of running away reports by killing their Unix processes due to very high system load (sometime the load could as high as 20 or 30). Could this mess up with anything inside Oracle? Unfortunately, there is no Oracle trace and dump files generated whatsoever.

Any hint or helps would be highly appreciated.

Limin Guo. Received on Fri Nov 21 2003 - 09:44:49 CST