Oracle FAQ Your Portal to the Oracle Knowledge Grid
HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US
 

Home -> Community -> Usenet -> c.d.o.server -> RAW devices - a thing of the past ?

RAW devices - a thing of the past ?

From: Reardon, Andrew J <Andrew.Reardon_at_Australia.Boeing.com>
Date: Mon, 2 Aug 1999 14:04:19 +1000
Message-ID: <21BEC9503B89D111A8F700805FE6A36902350916@xch-bne-01.bal.bna.boeing.com>


> Hi All
>
> I've been mulling this one over for a little while now, and I'd thought
> I'd get your thoughts on it.
>
> The discussion centers around using raw devices on Solaris 2.6 and
> upwards.
>
> We all know and acknowlegde that using raw devices gives us the best
> possible thruput for db i/o, because operations to raw devices do not have
> to go thru the o/s's buffer - the reads/writes are *direct*, not indirect
> as with cooked files.
>
> Bypassing the o/s buffercache is even more important when related to the
> way that Solaris handles virtual memory. What ends up happending, if you
> use cooked files, is that the data for those files ends up *dominating*
> the o/s buffercache - why, because there is usually a lot of it (the
> data), and it's accessed very frequently. Consequently free vm page
> availability suffers, and the whole vm management is subverted, at the
> expense of all other present and future processes.
>
> So, assume (for whatever reason - we know that they exists and are valid),
> you just want to stick with cooked files. What can you do to avoid this
> problem of your db data repeatedly and constantly filling up the
> buffercache ? Until recently I was only aware of one option - and that is
> to employ the new Priority Paging algorithm (available of a patch for
> Sol2.6, and comes standard with Sol7+) which patches the vm manager is
> Solaris to "be smarter" about what it keeps in the cache and what it
> flushes out - depending on whether it's "data" or executable code. The aim
> is that, when it comes time to free some pages, the page stealing deamon
> will free up data pages in preference to executable pages, on the basis
> that we're always going to have more data in cache than executable, but
> it's probably most important to cache as much executable as possible.
>
> Now, as I see it, there are a couple is issues with this approach. Firstly
> it can only be a band-aid measure - ie, it will relieve the symptom but
> not the cause. The page scanner is still going to have to work just as
> hard in examining the pages and types of pages it may or not not free.
> Secondly it relies crucially on the permission bits on the files that get
> cached. If your data files have an x bit set, then there going to be
> treated as high prioroity executables.
>
> So where does this all leave us ? Well, at this point, along comes a
> little gem by the name of "forcedirectio". I was told about this by a guy
> (forget his name, but whoever you are - Kudos ! :) on another unix-based
> newsgroup, in reply to a guy who was having trouble comprehending how his
> SAP installation was able to chew up GB after GB of memory, no matter how
> much more he added. The answer was he was using cooked files, and they
> were just going straight into, and remaining in, the cache - hurrah for
> Solaris vm management - it's doing it's job and making use of our
> expensive memory, but in this case it's doing it's job a little too well.
>
>
> I followed this up and the guy tells me about the apparently little known
> mount option forcedirectio, which makes all i/o to that f/s direct - ie,
> *bypassing the buffercache*... man mount_ufs(1M):
>
> ...
> noforcedirectio | forcedirectio
> If forcedirectio is specified and
> supported by the file system, then
> for the duration of the mount
> forced direct I/O will be used. If
> the filesystem is mounted using
> forcedirectio, then data is
> transferred directly between user
> address space and the disk. If the
> filesystem is mounted using nofor-
> cedirectio, then data is buffered
> in kernel address space when data
> is transferred between user address
> space and the disk. forceddirectio
> is a performance option that bene-
> fits only from large sequencial
> data transfers. The default
> behavior is noforcedirectio.
> ...
>
> This option can be used to remount a live f/s on the fly.
>
> The best way to actually see how memory is being allocated on a very
> general level, IMHO, is with the prtmem tool in the RMCmem package. Using
> this package to sample data over serveral days, I was able to get a
> picture of the effects on using and not using forcedirectio, and the
> results were just as expected: before - buffercache leaps up as soon as
> people start using the db, and stays there until some time after they stop
> using it; after - buffercache is used and then freed when other data
> processing processes (eg a perl script chewing thru many megs of data)
> fire up, and then drops back when other processes need some room.
> Basically, normal memory management in the presence of many gigs of db
> data on cooked files. The bottom line: memory is available when and how it
> is needed, and, you can get a clear and accuratre picture of your
> machine's memory usage - when, where, by what, how much, for how long,
> etc, instead of just seeing free memory flatlining as soon as the db gets
> used.
>
> CONCLUSION
> ----------
>
> Raw devices provide the greatest i/o thruput becuase they bypass the
> buffercache.
> Raw devices are not as easy to maintain for a dynamic system than cooked
> files.
> Cooked files are slow because they go theu the buffercahce.
> Priority paging can help this, but only to a degree.
>
> Mounting a f/s holding cooked db files with forcedirectio gives you most
> of the
> benefits of using raw space, without the drawbacks.
>
>
>
> Any comments/discussion most welcome.
>
> Thanks for your time in reading this ! :)
>
> Andrew Reardon
> UNIX/Informix Administrator
> Aircraft Systems, Boeing Aust. Ltd.
> Ph: +61 7 3306 3346 Mob: +61 0419 745 831

 Sent via Deja.com http://www.deja.com/  Share what you know. Learn what you don't. Received on Sun Aug 01 1999 - 23:04:19 CDT

Original text of this message

HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US