Oracle FAQ | Your Portal to the Oracle Knowledge Grid |
Home -> Community -> Usenet -> c.d.o.misc -> Re: Resend: performance problem with fetch using 8.1.7 32-bit client to access a 64-bit server
"Sybrand Bakker" <gooiditweg_at_sybrandb.demon.nl> wrote in message
news:33pf20t0ks9si86ln3ee646s6cnkc5r95c_at_4ax.com...
> On Mon, 9 Feb 2004 15:36:25 +0700, <harronc_at_attglobal.net> wrote:
>
> >Please ignore previous message.
> >
> >We are running 8.1.7.4 64-bit Oracle on Solaris 8/sparc3. We have a
client
> >which was compiled on a 32-bit sparcII platform. We find that when we run
> >multiple instances of the client we find that it doesn't scale very well,
> >even though the product vendor believes that it should and we have
evidence
> >to prove this in different sites. We have already spent a lot of effort
> >tuning the server and we now believe that it is well tuned. The server
> >spends 90% of it's time idle. We have found that the client also spends
> >about 90% of its time sleeping even though it has work to do, and there
are
> >plenty of free resources (only 25% of 24 CPUs are utilised, 40% of free
> >memory, and IO is healthy).
> >
> >When we dig deeper into what the client is doing using pstack, we find
that
> >that it spend most of its time fetching from various cursor and
specifically
> >in the read function. An example stack trace is below:
> >
> >fea9e570 read (a, 8077ee, 810)
> >
> >fefc1a88 sntpread (810, 80560c, 804aec, 8077ee, 804a9c, 7f2544) + 10
> >
> >fefbf114 ntprd (804aa8, 805108, 7fb3d8, ff257314, 0, 7f9088) + 134
> >
> >fee1f490 nsprecv (8048c8, ffbe875f, ff285ce8, 0, 7f9088, 7fb3d8) + 58c
> >
> >fee24184 nsrdr (8048c8, 8080f0, ff285ce0, 0, 27, a) + 1dc
> >
> >fee09bc8 nsdo (7fcd08, 1, 0, 0, 7f9088, 8048c8) + f14
> >
> >fee91044 nioqrc (7f250c, ff257314, c27, 7fcd08, 0, 7f9088) + 6c0
> >
> >fefca86c ttcdrv (7f5c78, 7f5fbc, ff257314, 7f5fac, 7f5fae, 7f5fac) + b7c
> >
> >fee996ec nioqwa (7f250c, 0, fefc9cf0, 7f5c78, 7f2490, 0) + 48
> >
> >fed19330 upirtrc (7f2468, 7f250c, 0, 0, 7f1df4, 7f2544) + 7f8
> >
> >fed6ab60 kpurcsc (7f25e8, ffbe9cae, 7f242c, 7f5c78, 7f6804, 0) + 70
> >
> >fedccd7c kpufch0 (1f5b2c0, 2, 7f2468, 7f5b74, 1f5b308, 6340) + a68
> >
> >fedcdc28 kpufch (1f5b2c0, 7f1f0c, 0, ff257314, 7f242c, 2) + 92c
> >
> >fecebec4 sqlcucFetch (1f5a5d0, ff2848b4, 1, 1f5a5d0, 0, 7f1f0c) + 20
> >
> >fecda5b4 sqlall (ff2848b4, 1, 22a0c28, 16, 58, 16) + 1bf4
> >
> >fecd17a8 sqlnst (382a62, ffbedb50, ffbedb50, 1, ff2848b4, a7f0f0) + 7b44
> >
> >fecb3d3c sqlcmex (0, a7f0f0, ffbedb50, 37d918, ff22d13c, ff257314) + 488
> >
> >fecb4180 sqlcxt (0, 7615ec, ffbedb50, 37d918, 37d800, d0) + 58
> >
> >We run pstack repeatedly and the client is invariably in the above
function.
> >
> >The client uses a local bequeath connection so I assume that the read()
is
> >reading from a pipe. I've noticed that the third parameter of read(), the
> >number of bytes, is always 810. It is possible that this is the
bottleneck.
> >It is the sdu that controls the number of bytes read. We have not set the
> >sdu and oracle documentation says that the default value is 2048. But
then
> >when in only 810 bytes read? (We have run pstack many time and it is
always
> >810 bytes).
> >
> >Even if the read size needs tuning this does not explain why the client
> >processes do not scale. The performance increases reasonably linearly up
to
> >eight parallel processes but beyond this the increase in throughput
slopes
> >off, and beyond 14 processes there is no increase in throughput at all,
even
> >though the oracle server is idle and there are plenty of free resources.
> >
> >We were told that there was a bit when using a 32-bit client to access a
> >64-bit server, but we have installed the relevant patch 2298968 and this
did
> >not improve the situation.
> >
> >We do not have source code for the client application but as it calls
sqlcxt
> >I assume it uses Pro*C. Any similar experience or any idea of what can be
> >causing the problem?
> >
> >Ciaran
> >
>
> Most likely your analysis is completely inadequate, and you are just
> barking at the wrong tree. The influence of sqlnet on total
> performance is completely negliceable. Also your assumption sqlnet
> always should transmit 2048 bytes demonstrates you don't know anything
> about the interaction between client and server. A sqlnet packet will
> never ever cross statement boundaries. If you see 810 byte packets all
> the time, the database only has 810 bytes to send for that statement.
> For your sqlnet analysis you should also rather use the trcasst tool
> than pstack, and if you insist on using pstack you should trace the
> dedicated server process server your client instead of the client.
> You would also check out to see what is going on in terms of
> sqlstatements, and you should check disk I/O.
> My crystall ball tells me, if you perform an adequate analysis, you
> will find you are dealing with an unscalable application.
>
I would say our analysis in inadequate, it's more a last ditch desparate attempt to solve a problem that we have been unable to solve for the last six months. The pstack output always says that the client is doing a fetch, although each time for different sql.
After 6 months of tuning the oracle server and various sql, we have the following typical statspack report for 1 hour:
Avg
Total Wait wait Waits Event Waits Timeouts Time (cs)(ms) /txn
---------------------------- --------------- -------------- -------------- - ------- ------ db file sequential read 709,534 0 485,112 7 33.2 latch free 34,194 29,875 47,577 14 1.6 db file parallel write 13,493 0 24,937 18 0.6 log file sync 22,394 0 21,367 10 1.0 SQL*Net more data to client 848,722 0 16,448 0 39.7 log file switch completion 181 37 13,305 735 0.0 enqueue 75 23 13,091 1745 0.0 log file parallel write 21,532 0 9,684 4 1.0 log file sequential read 65,552 0 3,883 1 3.1 buffer busy waits 2,499 0 1,422 6 0.1 direct path read 15,594 0 704 0 0.7 file identify 36 0 564 157 0.0 file open 25,192 0 555 0 1.2 control file parallel write 1,270 0 218 2 0.1 db file scattered read 429 0 136 3 0.0 LGWR wait for redo copy 367 7 26 1 0.0 control file sequential read 697 0 18 0 0.0 row cache lock 85 0 7 1 0.0 direct path write 3,147 0 5 0 0.1 log file single write 16 0 4 3 0.0 refresh controlfile command 10 0 2 2 0.0 SQL*Net break/reset to clien 2 0 0 0 0.0 db file single write 2 0 0 0 0.0 SQL*Net message from client 14,090,519 0 26,082,903 19 658.8 PX Idle Wait 21,012 21,012 4,307,460 2050 1.0 SQL*Net message to client 14,090,518 0 7,631 0 658.8 SQL*Net more data from clien 442,221 0 2,0470 20.7
Note that 75% of wait time is for db file sequential read, and with an average wait time of 7ms this looks reasonable. Most of the read come from a single index cluster table with 300 million rows so there not much we can do to reduce the number of physical reads. Anyway the volumes where the index cluster is located are never more that 15% busy (were using multiple fibrechannel connection to an enterprise storage array with 24G internal cache, its fast).
We were worried about the latch free waits which are consistently 7.5% of the total wait time. The latch stats are as follows:
Get Spin & Latch Name Requests Misses Sleeps Sleeps 1->4 -------------------------- ---------------- ----------- ----------- -------- ---- cache buffers chains 94,983,959 44,728 1,670 43094/1608/1 9/7/0 library cache 8,691,838 34,134 10,29827914/2268/3
829/123/0
checkpoint queue latch 6,135,872 25,742 21,883
3868/21865/9
/0/0 redo allocation 5,090,786 16,847 111 16736/111/0/ 0/0 session idle bit 28,284,866 6,229 79 6150/79/0/0/ 0 dml lock allocation 1,284,074 1,529 15 1514/15/0/0/ 0 active checkpoint queue la 25,554 336 14 322/14/0/0/0 enqueue hash chains 1,421,549 334 17 317/17/0/0/0 undo global data 271,350 164 9 155/9/0/0/0 redo writing 135,560 130 8 122/8/0/0/0 messages 201,923 115 2 113/2/0/0/0 cache buffers lru chain 676,086 69 45 24/45/0/0/0 shared pool 105,216 50 40 29/5/13/3/0 session allocation 109,383 41 338/3/0/0/0
There is some library cache latch contention. Our application vendor has done a lot of work to reduce to an absolute minimum the amount of parsing done by our application. It always uses bind variables and some sql which unexplained caused high parsing have been rewritten. There's nothing more we can do in this area. The remaining contention on the library cache is now caused by different appliation accessing the same database.
A big chunk of the latch free waits are on the checkpoint queue latch. I don't believe this directly impacts the user processes. We do plan however reduce the number of database writers to see if we can reduce this.
So we not so worried about the latch free given the above analysis.
Bear in mind that oracle spent a lot of its time idle during the statspack interval. See the number of SQL*Net Message from client wait events.
So it is following the above that I turned to using pstack to trace what the application is doing. As I said oracle spends a lot of it time idle, and the client application spends a lot of time sleeping, so I think it makes sense trying to find out what it is sleeping on, if it is not waiting for oracle?
Ciaran
>
> --
> Sybrand Bakker, Senior Oracle DBA
Received on Mon Feb 09 2004 - 21:53:02 CST