Oracle FAQ Your Portal to the Oracle Knowledge Grid
HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US
 

Home -> Community -> Usenet -> c.d.o.server -> Re: Synchronous writes & TEMP

Re: Synchronous writes & TEMP

From: VC <boston103_at_hotmail.com>
Date: Fri, 14 May 2004 13:32:14 GMT
Message-ID: <yT3pc.206$qA.84403@attbi_s51>


Hi Jonathan,

"Jonathan Lewis" <jonathan_at_jlcomp.demon.co.uk> wrote in message news:c82a5e$jgm$1_at_titan.btinternet.com...
>
> How are you testing that the writes are synchronous ?

The file is opened with the O_DSYNC flag which means that all the subsequent writes are synchronous:
....
open64("/oradata/lmtemp01.dbf", O_RDWR|O_DSYNC) = 10 .....

>
> I've only checked this fairly recently so I can't make
> any comment about older versions of Oracle, but
> my observations suggest that direct writes take place
> without waiting for file-system sync - except for a
> small number, which are (I guess) the last writes in
> a direct write pass that your session expects to do.
>
> I base this comment on observations of waits for
> direct writes using event 10046 level 8. In a simple
> test case, an order by on some 32 MB of data, the
> trace file showed only two waits for direct writes
> for a total of 13 blocks, even though the total volume
> written was clearly far in excess of 13 blocks.

Well, the 100046/8 waits may not be a reliable indicator of what's going on.
For example, under Solaris 2.8, Oracle uses 'emulated' async_io on filesystems (raw devices support 'real' async_io) which means that several, usually four, threads execute write requests in parallel thereby decreasing perceived waits. The OS level trace shows this:

.................
 0.0496 pwrite64(403, ..) = 393216
 0.0503 pwrite64(403, ..) = 393216
 0.0328 pwrite64(403, ..) = 262144
 0.0798 pwrite64(403, ..) = 393216

 0.0505 pwrite64(403, ..) = 393216
......................

(around 8 MB/s)

The first column shows time spent in the call in seconds and the last number of bytes written to the disk.
The numbers above are for a Sun disk array. When the disk array hardware cache malfunctions, one might get these numbers:

 0.3471 pwrite64(406, ..) = 393216
 0.2846 pwrite64(406, ..) = 393216
 0.2967 pwrite64(406, ..) = 393216

(around 1.1 MB/S)

>
> There may be variations dependent on o/s, and optional
> extras installed at either layer.
>
>
> For writes that go through the buffer I guess it's either
> a question of minimising code changes - or simple oversight,
> and someone will get around to it eventually.

My thought is that had the TEMP file (not a raw defice) been opened _without_ the O_DSYNC option, there would have been an obvious performance gain for direct writes thanks to the filesystem cache. So the question remains: why the writes are synchronous ?

Regards.

VC

> --
> Regards
>
> Jonathan Lewis
Received on Fri May 14 2004 - 08:32:14 CDT

Original text of this message

HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US