Re: The ill effects of NOT using direct I/O

From: LS Cheng <exriscer_at_gmail.com>
Date: Mon, 14 Feb 2011 08:45:04 +0100
Message-ID: <AANLkTinFsOp96ezB6gA1xm_FvVmFDoE4TVGRL+9EWGWU_at_mail.gmail.com>



Hi

In 9i you had to enable Direct I/O at filesystem level, this is no longer true from 10g. This new behaviour is mentioned in

http://download.oracle.com/docs/cd/B19306_01/server.102/b15658/toc.htm

This is true I believe for AIX, Linux and Tru64.

Regarding the effects of Direct I/O, the benefit is tremendous specially in a busy and I/O bounded system, I have suuccesfully rescued a few Linux systems just by enabling Direct I/O, before doing so the system hangs time to time due to kswapd kicks in and starts freeing memory, while this lasts the system didnt respond.

Thanks

--
LSC


On Mon, Feb 14, 2011 at 6:56 AM, Gaja Krishna Vaidyanatha
<gajav_at_yahoo.com>wrote:


> Hi Yong,
>
> Nice writeup. I have started a new thread in an effort to focus on the
> topic of discussion. Yes, indeed on Linux with ext3 filesystems, setting
> FILESYSTEMIO_OPTIONS to SETALL and not doing anything special at the
> filesystem mount is enough to get direct I/O working. But as mentioned by me
> in a prior note (in the previous thread), the same is not true for jfs or
> vxfs, the last time I checked (a few months ago). You also bring a very
> relevant point to the discussion, the distinction between page cache and OS
> buffer cache. Let us for the purposes of our discussion, just say that the
> *filesystem buffer cache* includes both the page cache and the OS buffer
> cache.
>
> Also, the scenario you have depicted in your last paragraph is interesting.
> My question to you - Have you observed what you are suggesting in your write
> up on a production system? The reason why I bring this up, is that in my
> experience the effects of NOT having direct I/O (with or without HugePages)
> has had crippling effects on the entire system, in this case in RH Linux.
> Similar behavior have also been observed and documented on other operating
> systems.
>
> The following is pretty much a standard sequence of what has been observed
> on RH Linux *without* direct I/O on very busy Oracle systems (100+ active
> sessions at any given time):
>
> 1) As the load on the system increases, more sessions run more SQL and
> eventually perform more I/O, thus increasing total I/O consumption on the
> system.
>
> 2) This causes the OS to allocate more memory to the filesystem buffer
> cache, in an effort to buffer future I/O requests (which may or may not
> occur). The classic "*double buffering*" problem of data in the Oracle
> database buffer cache and the filesystem buffer cache has now begun.
>
> 3) An increase in the memory footprint of the filesystem buffer cache
> causes an increase in the activity of the paging daemon - the OS tries to
> take preemptive action against a potential memory bottleneck.
>
> 4) As the I/O pressure increases, the paging daemon starts to work even
> more aggressively (significantly increasing the number of pageins and
> pageouts). Eventually, the paging daemon starts to get even more aggressive,
> not only increasing its paging rates, but also swapping portions of the SGA
> (even if it were locked) when the demand for memory increases. *A
> system-wide Memory Bottleneck has now become a reality*.
>
> 5) This further increases the CPU consumption on the system (listed below
> in #6) and also starts to create an artificial *I/O Bottleneck* (db file
> sequential read and db file scattered read times increase exponentially, as
> Oracle database blocks need to brought in from the OS's swap partitions).
> OUCH!!!
>
> 6) This increase in paging daemon activity combined with the increase in
> the filesystem buffer cache footprint, increases the overall CPU utilization
> of the OS (verified with a continuous sar - u or vmstat). The "sar -u"
> breaks down CPU consumption is in 4 buckets - %usr, %sys, %wio, %idle. *
> %sys* will consistently and continuously increase.
>
> 7) This starts to create an artificial *CPU Bottleneck* as %sys in many
> cases consumes a very significant percentage of total CPU on the system
> (have observed *%sys consume 40% - 50% of total CPU*).
>
> 8) Imagine what havoc starts to unfold when the *OS consumes 50% of the
> system's CPU resources* - *CPU Bottleneck* *becomes a reality*. Nobody
> plans for the OS to consume 50% of configured CPU resources while engaging
> in system sizing and capacity planning efforts.
>
> 9) Pretty soon, the CPU Bottleneck causes *Oracle sessions start to
> timeout*.
>
> 10) Depending on the application vendor and their connection pool
> mechanics, the application then starts to re-try connections to the
> database, which starts a *connection storm to the database*. In some
> cases, the database is completely hung-up with various session/login-related
> latch issues.
>
> 11) Sooner than later, the database comes to a grinding halt (with or
> without a crash), and the OS eventually may come to a grinding halt by
> imploding with a *kernel panic*. Even this did not occur, you pretty much
> have an unusable Oracle database at this stage of the game.
>
> The above is pretty much a "poster child storyline" of very busy Oracle
> systems with hundreds of concurrent sessions that do not use direct I/O.
> Again, this is my experience and I'm curious if you or others on this list
> have observed anything to the contrary. If so, please provide the relevant
> details, as I would love to learn alternative system behavioral patterns
> without direct I/O. It is something that interests me quite a bit and I'm
> very curious. Thanks for taking the time to read this long note :)
>
> Cheers,
>
> Gaja
>
> Gaja Krishna Vaidyanatha,
> Founder/Principal, DBPerfMan LLC
> http://www.dbperfman.com
> Phone - 001-(650)-743-6060
> Co-author:Oracle Insights:Tales of the Oak Table -
> http://www.apress.com/book/bookDisplay.html?bID=314
> Co-author:Oracle Performance Tuning 101 -
> http://www.amazon.com/gp/reader/0072131454/ref=sib_dp_pt/102-6130796-4625766
>
>
> ------------------------------
> *From:* Yong Huang <yong321_at_yahoo.com>
> *To:* Freek.DHooge_at_uptime.be
> *Cc:* oracle-l_at_freelists.org
> *Sent:* Sun, February 13, 2011 10:41:30 AM
> *Subject:* RE: How much RAM is to much
>
> > But I always thought that setting the filesystemio_options parameter
> > to directIO or setall caused the processes to open the files with
> > the O_DIRECT flag.
>
> I can confirm what Freek said, on my Oracle 10.2.0.4, Red Hat 5 kernel
> 2.6.18-194.17.1.el5 architecture x86_64 if these details matter. The
> filesystem is ext3, mounted with no option (i.e. everything default).
> When the process opens a datafile for the first time, I can see the
> flags passed to open() call in strace are O_RDWR|O_SYNC|O_DIRECT.
>
> By the way, ever since page cache is introduced to the file system,
> we should make a distinction between buffer cache and page cache. What's
> interesting here is page cache, because buffer cache caches file system
> metadata. Well, in Linux, I believe buffer cache is physically inside
> page cache. So you may call those buffers "buffer pages". Confusing name
> indeed. But to make it short, as soon as you have the word "buffer" in
> it, people knowing Linux kernel (not me!) will think you're referring to
> the cache for metadata, not for file content, which may be what you want
> to say.
>
> Last point. While Oracle manages data in memory through a better
> mechanism because it knows the data better, leaving the decision of
> caching/not caching to the file system is not that dreadful. In some
> cases, it may even be a big help. So instead of assigning most memory to
> Oracle buffer cache and using direct I/O, someone can make a comprehensive
> test to see if locking a large portion of RAM to SGA (through HugePages
> etc.) and leaving file system to NOT direct I/O will be even better in
> some cases. If a table slightly bigger than 2% of the buffer cache is
> frequently full-scanned, and you didn't do anything special about it, I
> think it's better to let the file system cache its content.
>
> Yong Huang
>
>
>
> --
> http://www.freelists.org/webpage/oracle-l
>
>
>
-- http://www.freelists.org/webpage/oracle-l
Received on Mon Feb 14 2011 - 01:45:04 CST

Original text of this message