RE: 10G and UFS - long write times

From: Hallas, John, Tech Dev <John.Hallas_at_gb.vodafone.co.uk>
Date: Mon, 12 Jun 2006 17:09:57 +0100
Message-ID: <1C6E45ADB2EC324F9553E468ABFE0F63034BDDA4@UKWMXM04>

Thank you all for the very constructive and helpful posts, in particular this one and Tanel's post.
I have been really busy and had no chance to further investigate the suggestions. However the change on one server of mounting the UFS filesystem as forcedirectio caused the time to drop from ~6 mins to ~50 seconds.

This might have been an artificial example (well not might, it was) however it was created to replicate an issue we were seeing. The full sequence of tests are to be repeated tomorrow after all the filesystems have been re-mounted and we can then see what else has been impacted

Thank you for the input and support

John

-----Original Message-----
From: oracle-l-bounce_at_freelists.org
[mailto:oracle-l-bounce_at_freelists.org] On Behalf Of jonathan_at_jlcomp.demon.co.uk
Sent: 09 June 2006 12:51
To: oracle-l_at_freelists.org
Subject: RE: 10G and UFS - long write times

John,

A few random ideas.

Since the 9i and 10g have extremely different performance characteristics on your 'commit in loop' test, the underlying problem may be the rewrite of redo handling that appeared in 10g.

Point 1: 9i doesn't wait for sync (and doesn't even have to post a message to LGWR) when you commit inside a pl/sql block - so many of the writes from the log writer are likely to be just "side effects" that trigger log writer writes (commit from other session, 1/3 rull, 1MB trigger limit , 3 second trigger). This means you may be seeing a relatively small number of larger writes from the log writer in the 9i case - which is an effect likely to faviour UFS.

10g does the same sort of thing - unless you have modified the commit_write parameter - but 10g has parallel public and private redo threads (i.e. redo log buffers) which may change the cost of very small rapid commits in loops. Private redo is (I believe) always copied into a public buffer on commit. This could be sufficient to change the timing or frequency of writes by lgwr.

So check (if you have the database to yourself) statistics redo writes, redo blocks written for the LGWR process: is 10g doing more, smaller writes. (This would tie in with your observation of the effect of direct I/O).

Point 2:
10g ignores your stated log buffer size, and picks its own sizing for the log buffers (private and public). Given the 1/3 full trigger on lgwr, you need to check the actual log buffer sizes in 10g, and make sure that the 9i log_buffer size matches this in order to make things fair.
select TOTAL_BUFS_KCRFA from x$kcrfstrand; may be the appropriate query to find the size of the 10g log buffers.

Point 3: (idle speculation)
I get two public log buffers on my laptop because I have 2 CPUs. Maybe it's always one per CPU, and maybe lgwr tries to clear every buffer every time it writes. If you have a lot of log buffers then maybe a significant fraction of your lost time is due to the log writer checking all the other buffers - a time lag that could show up in your session as log file sync, and in the lgwr session as CPU used.

Point 4: (idle speculation 2)
Because of the persistent commits - maybe your session has to keep switching private redo buffers (in 10g), and perhaps this takes up an unusualy amount of CPU time. The 'parallel redo' code exists (I think) to reduce the serialisation in the redo buffer and redo latches for highly concurrent systems - but you DO pay for it in extra activity elsewhere. Possibly your test case highlights the potential overheads, but can't reflect the concurrency gain.

Regards

Jonathan Lewis

> Subject: 10G and UFS - long write times
> Date: Thu, 8 Jun 2006 09:16:22 +0100
> From: "Hallas, John, Tech Dev" <John.Hallas_at_gb.vodafone.co.uk>
>
> We are seeing big problems when issuing disk writes using 10G (both
10.1
> and 10.2). Reads are not an issue
> Very difficult to track down the exact combination but so far it looks
> as if it is a combination of 10G and local disk mounted using UFS .
>
>
> BEGIN
> FOR r IN 1..50000 LOOP
> INSERT INTO test (a,b) VALUES (r, 'test');
> COMMIT;
> END LOOP;
> END
>
> On SAN disk and 9i on local UFS mounted disk I am seeing response
times
> from 0:15 to 0:30 seconds
>
> On 10G on San I see the same
>
> On 10G using local UFS disk the time various from 7:00 to 9:00 minutes
> and is very repeatable.
>
> PS the waits are all IO related - log buffer , log file sync etc
>
> Thanks for any help
>

--
http://www.freelists.org/webpage/oracle-l




--
http://www.freelists.org/webpage/oracle-l

Received on Mon Jun 12 2006 - 11:09:57 CDT