Re: direct reads and writes on Solaris

From: David Miller <David.J.Miller_at_Sun.COM>
Date: Fri, 25 Jan 2008 11:35:07 -0600
Message-id: <479A1DCB.9070405@Sun.COM>


Hi Dan,

A couple of explanations. First, the main reason to do directio is to get around the Posix single-writer lock. This lock is a mechanism that prevents 2 processes from writing to the same file at the same time, mainly to prevent them from both trying to write the same block and having unpredictable results.

Since Oracle is already handling the coordination of writes, this lock is not needed. But the filesystems and OS enforce it automatically. On larger systems this can cause contention, since many processes may want to write to the same Oracle datafile at the same time and will be forced to be single-threaded.

So methods were introduced to get around those semantics. The 4 that work on Solaris are using raw devices directly, UFS with directio (either through filesystemio_options = setall or the forcedirect mount option), VxFS with QIO or ODM and QFS with samaio. Note that VxFS with mincache=direct is NOT included here because it does NOT eliminate the single-writer lock. You have to have QIO or ODM with VxFS to avoid the lock.

A second benefit of directio is bypassing the buffer cache which can help on writes by reducing the code path, although this is not always as big a benefit.

In particular, your test with a single dd, does NOT hit the single-writer lock and so is not the same as Oracle write performance. Plus it's sequential and most of Oracle's I/O in OLTP contexts will be random. Still what you're seeing is some interaction with the buffer cache that costs efficiency here.

If you want to see the benefits of directio, you'll need to convert to one of the 4 filesystem choices mentioned above.

Regards,

Dave Miller

Dan Norris wrote, On 01/24/08 18:10:
> Thanks, looks like that confirms my theory below. (Not sure how I didn't
> find those references myself--sorry.) I then have one related question.
>
> We did some specific testing where we used a crude method to test I/O
> (specifically, write) performance. The test was this:
>
> timex dd if=/dev/zero of=<device> bs=1024k count=2048
>
> For the <device> we tried many different things. The interesting part
> (and here's where I'd like some input) is that the results for testing
> the same device via buffered (block) devices was much, much slower than
> the result for the unbuffered (char) device. All things equal, here are
> some sample tests:
>
> /dev/vx/dsk/testdg/test
>
> real 25.12
> usr 0.02
> sys 24.94
>
> /dev/vx/rdsk/testdg/test
>
> real 10.35
> usr 0.01
> sys 1.55
>
> So, basically, it took more than 2x as long to do the dd to the buffered
> device as compared to the unbuffered device. I was sort of expecting
> that writes to the buffered device would be possibly a little faster or
> maybe about equal. I never expected to have such a big delta and I also
> didn't expect that so much system time would be spent just writing to a
> buffered device.
>
> Any of you I/O gurus see anything interesting in these results? Are the
> testing methods even valid? My conclusion is that since we're likely
> doing buffered I/O now (since we're not doing directIO), that if we
> switched to doing directIO (which is unbuffered by definition), that
> we'd see considerable performance gain--at least for writes (since my
> test was only for writes). I would presume that reads might be a similar
> ratio though.
>
> Dan
>
> ----- Original Message ----
> From: Ukja.dion <ukja.dion_at_gmail.com>
> To: dannorris_at_dannorris.com; Oracle L <oracle-l_at_freelists.org>
> Sent: Thursday, January 24, 2008 5:55:18 PM
> Subject: RE: direct reads and writes on Solaris
>
> Visit following URLs
>
>
>
> http://www.solarisinternals.com/wiki/index.php/Direct_I/O
>
>
>
> http://www.ixora.com.au/notes/filesystemio_options.htm
>
>
>
>
>
>
>
> *From:* oracle-l-bounce_at_freelists.org
> [mailto:oracle-l-bounce_at_freelists.org] *On Behalf Of *Dan Norris
> *Sent:* Friday, January 25, 2008 7:14 AM
> *To:* Oracle L
> *Subject:* direct reads and writes on Solaris
>
>
>
> Can someone help me interpret this set of data correctly?
>
> The (vxfs) filesystem is mounted with these options:
> /db51 on /dev/vx/dsk/oracledg/db18
> read/write/setuid/mincache=direct/delaylog/largefiles/ioerror=mwdisable/dev=3ac36c1
>
> This is 9.2.0.8 on Solaris 9 (V490, Generic_122300-07) with VxFS 4.1.
>
> I have the following line in a truss of a dedicated server process:
>
> open("/db51/oradata/tccrt1/member_questions_d01.dbf", O_RDWR|O_DSYNC) = 9
>
> I also have the following settings in the DB:
>
> NAME TYPE VALUE
> ------------------------------------ -----------
> ------------------------------
> disk_asynch_io boolean TRUE
> filesystemio_options string ASYNCH
>
> The question(s):
> I was expecting to see O_DIRECT in there somehow, but I'm thinking that
> maybe that's just on Linux, not Solaris. I don't see O_DIRECT listed in
> the open(2) manual page. I am also wondering if filesystemio_options
> needs to be "setall" instead of the current setting of "ASYNCH" in order
> to achieve directIO. Or, am I looking at the wrong thing to determine if
> directIO is enabled?
>
> Thanks in advance!
>
> Dan
>
>

--
http://www.freelists.org/webpage/oracle-l
Received on Fri Jan 25 2008 - 11:35:07 CST

Original text of this message