Oracle-L: Re: SAN

From: <Babu.Nagarajan_at_Cummins.com>
Date: Tue, 13 Aug 2002 07:30:17 -0800
Message-ID: <F001.004B2D56.20020813073017@fatcity.com>

Tim, Jared and Kirti

Jared, Kirti : Thanks a lot for your input. and yes, I read the Sane SAN paper.

Tim : Many thanks to you for pointing out some of the big "mis"assumptions I had made. I have corrected most of the stuff you had mentioned except for these

>. * I'm less clear on whether SANs themselves perform read-ahead and the
> conditions under which they do so. I'm pretty sure that they are smarter
> about it than what you describe; usually read-ahead mechanisms are
> triggered by detected patterns of usage, not algorithms as simple as
> described...

Will the "smarter algorithm" look inside the contents of a file before reading it? If it does not, then how will it be able to "intelligently" read ahead what data Oracle wants from inside its datafile? If it does, how does it decipher the Oracle's way of storage?

> * your example about "read-ahead" conflicts makes some invalid
> assumptions, namely about space being allocated in blocks not extents
(when
> does *that* ever happen?) and about "read-ahead being set to 3 blocks"
> (again, when does *that* ever happen?).

It does not happen. However I am going to be talking to a bunch of non-Oracle folks and management so I want to keep it as simple as possible.

> Altogether, empirical evidence (i.e. many successful SAN implementations
> under Oracle over several years) does not lend credence to your basic
> assertion that "SAN and Oracle don't go well together". It is a fact
that
> they do...

I am not trying to make a statement "SAN and Oracle dont go well together". I am trying to convince my management that buying a SAN does not mean that we never need to worry about IO any more. Even a SAN needs to be configured. Currently they are under the impression that there are no IO problems but my database IO waits are 50% of the total response time. All my index, table data are scattered all over the disks - many on the same disk and the answer I get is "No, we are not tasking the SAN at all. There are no IO issues"

Thanks a lot

Babu

Tim Gorman <Tim_at_SageLogix.com>@fatcity.com on 08/13/2002 07:58:29 AM

Please respond to ORACLE-L_at_fatcity.com

Sent by: root_at_fatcity.com

To: Multiple recipients of list ORACLE-L <ORACLE-L_at_fatcity.com> cc:

Babu,

Is it possible that you are confusing the term "SAN" with the term "NAS"? As I read through your email, I couldn't help thinking that you discussing "network-attached storage" rather than "storage-area networks". If so, some
of my comments below might change slightly, but not much...

---

Most of your major assumptions are correct, but there are important
errors...

    * DBWR only does RW, never RR or SR.  Mostly Oracle server processes do
RR and SR, but ARCH also does SR, as do backup processes (whatever they
are);  everyone always forgets to add backup processes to the mix...

    * SW is characteristic of LGWR and ARCH, but also of processes
performing sorting (i.e. "direct writes" wait-event).  I think you'll agree
that databases generating a lot of redo (and archived redo) and performing
lots of sorting are not necessarily "misconfigured".  The amount of redo
generated is really a characteristic of the application itself, not the
database configuration.  High amounts of sorting can possibly be tuned, but
that too is more a characteristic of the application and users usage of it
than database configuration...

    * I'm not sure what your conclusions regarding RAID5 chunks or RAID0
stripes are, but I suspect they are incorrect.  Oracle DB_BLOCK_SIZE should
not come close to matching RAID5 chunksize of RAID0 stripesize;  even
"(DB_BLOCK_SIZE * DB_FILE_MULTIBLOCK_READ_COUNT)", which denotes the
largest
I/O requests (for full-table scans) generated by Oracle should be much
smaller than RAID5 chunksize or RAID0 stripesize, for most databases.  So,
whatever conclusions you are drawing from that point about sizes is likely
incorrect...

    * I'm less clear on whether SANs themselves perform read-ahead and the
conditions under which they do so.  I'm pretty sure that they are smarter
about it than what you describe;  usually read-ahead mechanisms are
triggered by detected patterns of usage, not algorithms as simple as
described...

Anyway, based on your mistaken assumptions, your list of conflicts between
SAN and Oracle are quite mistaken as well...

    * the difference between the "stripe width" and DB_BLOCK_SIZE is not
"excess I/O" at the SAN level;  the disk drives do not necessary read the
entire "stripe" or "chunk";  they merely *store* data in those extents on
the device.  They don't have to read/write in those increments...

    * your example about "read-ahead" conflicts makes some invalid
assumptions, namely about space being allocated in blocks not extents (when
does *that* ever happen?) and about "read-ahead being set to 3 blocks"
(again, when does *that* ever happen?).  You do have some of the basic
ideas
right, but please remember that your assumptions may be overly simplistic
or
just unlikely.  Moreover, remember that some of your basic assumptions
(especially regarding SW and database configuration) are just plain
wrong...

    * your points about caching are mostly correct, except for DBWR doing
reads again.  Also, even though LGWR uses something called the "log
buffer",
please be aware that this data structure is not a "cache".  A "buffer" is a
data structure into which data is written once and read only once;  a
"cache" is a data structure into which data is (hopefully) written once and
(hopefully) read many times.  So, I/O in the LGWR stream is *not* cached by
Oracle at all.  The "buffer" mechanism is there purely to facilitate
concurrency and the multiplexing of multiple server processes generating
redo into the single LGWR process performing the write to online redo log
files.  Lastly, your comment about "SAN's buffer can never really provide
to
Oracle the data it reads most ? Its already there in Oracle" is just plain
incorrect.  Please remember the distinction between a "buffer" and a
"cache", first of all.  Second, remember that not all I/O is cached by
Oracle (i.e. redo).  Third, please remember that database performance
"health" is not guaranteed by high BCHR in Oracle anyway...

---

Many of the concepts discussed here are not characteristic only of SANs;
they also pertain to file-systems, logical volume managers, JBOD, and NAS,
not just SANs.  Please rethink some of the concepts you are thinking
about...

Altogether, empirical evidence (i.e. many successful SAN implementations
under Oracle over several years) does not lend credence to your basic
assertion that "SAN and Oracle don't go well together".  It is a fact that
they do...

---

If my original supposition that you are confusing "SAN" with "NAS" is
correct, then I would agree with you that "NAS and Oracle don't go well
together" in most situations, especially those involving high volumes of
I/O.  NAS is great for non-DBMS uses (i.e. file serving) and for uses with
low-volumes of I/O from DBMSs (i.e. DEV environment), but there is lots of
empirical evidence out there indicating that NAS stinks for high-volumes of
I/O from DBMSs.  Of course, that opinion is based on the current state of
affairs -- many NAS vendors have significant advances in technology in the
pipeline that may dramatically alter that assessment in the future, even
the
near future...

Hope this helps...

-Tim

----- Original Message -----
To: "Multiple recipients of list ORACLE-L" <ORACLE-L_at_fatcity.com>
Sent: Monday, August 12, 2002 11:48 PM


All

I have a meeting tomorrow where I am going to point out why SAN and Oracle
does not go very well together. Here are my thoughts. Can you pick holes in
this argument, modify it or suggest any changes....

TIA

Babu

SAN and Oracle ? Conflicting IO behavior
*     There are four types of IO in Oracle
1.    Random Reads (RR) ? DBWR - Using indexes
2.    Sequential Reads (SR) ? DBWR - Full table scans
3.    Random Writes (RW) ? DBWR ? Writing dirty blocks
4.    Sequential Writes (SW) ? LGWR, Arch ? Writing redo logs and Redo
Archival + Control files

*     Bulk of any Oracle database's IO is done in RR, SR and RW. If SW is
very high it denotes configuration problems.
*     SAN (or for that matter any RAID device) is configured for writing or
reading large chunks at a time.  The stripe size on most SANs and RAID
devices are 256K or more. Compare this to the Oracle block size of 4k/8k in
most databases (going upto 32K in datawarehouses)
*     SANs do *Read Ahead*. If one block is requested, they read more than
one blocks *while at the disk* hoping that the same process will request
the other blocks some time soon.

Here is the conflict.
*     When ever Oracle does a RR, SR or RW it writes randomly and not
sequentially.  It will read/write a particular block at a time in case of
RR and RW and 'x' blocks (where x = dbfile_multi_block_read_count) in case
of SR. Therefore only during SR will Oracle use the entire stripe width In
all other cases, The difference in the stripe width and db_block_size will
be excess IO.
*     Why *read ahead* will cause a conflict :
*     The internal structure of a datafile could be as follows. The file
consists of 10 blocks. These are occupied by 3 tables.  The blocks shown
below are numbered using table_name.block_number
|---------+---------+---------+---------+---------+---------+---------+-----

----+---------+---------|
|         |         |         |         |         |         |         |
|         |         |
| 1.1     | 1.2     | 2.1     | 3.1     | 3.2     | 3.3     | 2.2     | 1.3
| 2.3     | 3.4     |
|         |         |         |         |         |         |         |
|         |         |
|---------+---------+---------+---------+---------+---------+---------+-----

----+---------+---------|



*     The first block on the datafile is the first block of table 1, second
block is the second block of table 1, the third block is the first block of
table 2 and so on.. (For simplicity sake, I am assuming Oracle will
allocate space in blocks and not in extents)
*     Now assume Oracle requests the first block of table 1.  Assume read
ahead is set to three blocks (three blocks will be read instead of 2
blocks). In this case the SAN will read 2.1, 3.1,3.2.
*     The blocks 3.1 and 3.2 will be entirely useless as Oracle is never
going to read it. SAN cannot tell that the block 2.2 that Oracle might
possible request next is the 7th block in the datafile and so it can never
*read ahead* intelligently.

Why the buffer of SAN has very little impact w.r.t Oracle read performance?
*     Oracle has its own buffering for all IO types
*     DBWR reads and writes uses the DB Buffer Cache
*     LGWR uses the Log buffer
*     Db buffer Cache is managed by a LRU Algorithm (Touchcount from 9I).
*     Bulk of the IO done by Oracle is Logical IO (LIO) and not Physical IO
(PIO).
*     Assume the buffer cache hit ratio is 80%. This means that only 20% of
the IO calls are PIO. Only 20% of the calls ever hit the SAN's cache. Since
this 20% is probably the least requested/never requested data (going by
Oracle's LRU algorithm) , its quite likely that the SAN's buffers don't
have this either.
*     Given that Oracle is going to cache even this 20% in its buffers, the
next PIO call is going to be for something totally different ? which is not
there in the SAN's buffer.
*     Couple this with the read-ahead (discussed earlier), Our SAN's buffer
is now populated with lots of data that Oracle might never use a PIO to
retrieve.
*     Thus the SAN's buffer can never really provide to Oracle the data it
reads most ? Its already there in Oracle.
To be fair, SAN's huge buffers will come as a boon to small databases ?
where the entire database can be cached in the SAN's buffers.


--
Please see the official ORACLE-L FAQ: http://www.orafaq.com
--
Author:
  INET: Babu.Nagarajan_at_Cummins.com

Fat City Network Services    -- (858) 538-5051  FAX: (858) 538-5051
San Diego, California        -- Public Internet access / Mailing Lists
--------------------------------------------------------------------
To REMOVE yourself from this mailing list, send an E-Mail message
to: ListGuru_at_fatcity.com (note EXACT spelling of 'ListGuru') and in
the message BODY, include a line containing: UNSUB ORACLE-L
(or the name of mailing list you want to be removed from).  You may
also send the HELP command for other information (like subscribing).

--
Please see the official ORACLE-L FAQ: http://www.orafaq.com
--
Author: Tim Gorman
  INET: Tim_at_SageLogix.com

Fat City Network Services    -- (858) 538-5051  FAX: (858) 538-5051
San Diego, California        -- Public Internet access / Mailing Lists
--------------------------------------------------------------------
To REMOVE yourself from this mailing list, send an E-Mail message
to: ListGuru_at_fatcity.com (note EXACT spelling of 'ListGuru') and in
the message BODY, include a line containing: UNSUB ORACLE-L
(or the name of mailing list you want to be removed from).  You may
also send the HELP command for other information (like subscribing).




-- 
Please see the official ORACLE-L FAQ: http://www.orafaq.com
-- 
Author: 
  INET: Babu.Nagarajan_at_Cummins.com

Fat City Network Services    -- (858) 538-5051  FAX: (858) 538-5051
San Diego, California        -- Public Internet access / Mailing Lists
--------------------------------------------------------------------
To REMOVE yourself from this mailing list, send an E-Mail message
to: ListGuru_at_fatcity.com (note EXACT spelling of 'ListGuru') and in
the message BODY, include a line containing: UNSUB ORACLE-L
(or the name of mailing list you want to be removed from).  You may
also send the HELP command for other information (like subscribing).

Received on Tue Aug 13 2002 - 10:30:17 CDT