Oracle FAQ | Your Portal to the Oracle Knowledge Grid |
![]() |
![]() |
Home -> Community -> Usenet -> c.d.o.misc -> Re: Anyone has information regarding Oracle7 and Raid configuration ???
Hi,
i attache you a note with some explanations !
RO
Senior Technical Analyst
Table of Contents
2.1.1 RAID 0: STRIPING WITH NO PARITY 2.1.2 RAID 1: SHADOWING 2.1.3 RAID 0+1: STRIPING AND SHADOWING 2.1.4 RAID 3: STRIPING WITH STATIC PARITY 2.1.5 RAID 5: STRIPING WITH ROTATING PARITY2.1.5.1 SUMMARY: ORACLE7 AND RAID LEVELS 3. ORACLE7 AND CACHED I/OS 3.1 TERMS AND BASICS
3.4.1 RELIABILITY 3.4.2 MEMORY WASTE 3.4.3 PERFORMANCE EXPECTATIONS 3.4.3.1 ORACLE DBWR AND DISK I/O CACHING3.4.3.2 ORACLE7 LGWR AND DISK I/O CACHING 3.4.3.3 SYSTEM-WIDE PERFORMANCE 3.5 SUMMARY: ORACLE7 AND DISK I/O CACHING PRODUCTS 4. DIGITAL UNIX-SPECIFIC TOPICS 4.1 LSM 4.2 RAW VS. FILESYSTEMS
4.2.1 RAW 4.2.2 UFS 4.2.3 ADVFS
5. OPENVMS-SPECIFIC TOPICS
5.1 THE SPIRALOG FILESYSTEM
5.1.1 WHAT IS SPIRALOG?
5.1.2 SPIRALOG AND ORACLE7
RAID disk caching products (hardware and software)
In addition, this paper covers Digital UNIX-specific and
OpenVMS-specific
issues.
2. Oracle7 and RAID Levels
This section is a discussion of the various RAID levels, their
advantages and
disadvantages, and their use with Oracle7.
2.1 RAID Levels
2.1.1 RAID 0: Striping with No Parity
RAID 0 offers striping only. It is not redundant (hence the name?);
there is
no protection against drive failure at all. It is simply a collection
of
drives in a stripe configuration.
During an I/O, a single drive gets <chunksize> bytes of I/O before the
I/O
continues onto the next drive in the set. For I/Os that fit in a single
chunk,
performance is the same is a single disk drive. For I/O's that span
more than
one chunk, there may be a slight performance improvement since disks are
able
to do a little work in parallel.
RAID 0 is useful with Oracle to reduce disk hot spots for Oracle data
files.
It is generally not recommended for other Oracle files.
2.1.2 RAID 1: Shadowing
RAID 1 provides redundancy by duplicating an entire disk drive onto
another.
It provides complete protection against single drive failures. It is
also the
most expensive (in $) form of RAID since it maintains entire copies of
disk
drives (perhaps even more than 1 copy).
During a read, any of the drives in the shadow set can be used. During
a write,
all drives will eventually be updated with the new data.
When all drives are functioning, reads complete slightly faster than a
single
disk read since the controller will route the read to a free (not busy)
disk.
Writes take slightly longer than a single disk write. Performance
characteristics are not effected much during a single drive failure. In
the
worst case, performance is equivalent to a single disk.
RAID 1 is generally useful to Oracle (if the $ cost is acceptable).
RAID 1 can
be used for any Oracle file. It is especially useful for Oracle redo
log files
and control files; Oracle only has to issue one redo log I/O, saving
code path
and context switching. However, the DBA/system administrator must use
the RAID
controller utilities to keep up with failed disks since the shadowing of
the
file is hidden from Oracle.
2.1.3 RAID 0+1: Striping and Shadowing
RAID 0+1 is often billed as a separate solution that offers the reduced
hot
spot and performance benefits of striping (RAID 0) and the redundancy of
shadowing (RAID 1). It is just as costly as RAID 1.
While RAID 0+1 can be used with Oracle data files, it should not be used
with
redo log files.
2.1.4 RAID 3: Striping with Static Parity
RAID 3 attempts to give performance and redundancy of RAID 0+1 without
the high
cost associated with RAID 1's 1-for-1 drive redundancy. A number of
drives are
ganged together in a RAID 0 stripe set. An additional drive is used to
keep
parity information for the stripe set.
During normal operation, RAID 3 gives performance similar to RAID 0.
Reads are
striped. Writes require two I/O's however; one for the data drive, and
one for
the parity. In the event of a single disk failure, the set continues to
function albeit at reduced performance. Disk blocks from the missing
disk are
reconstructed by reading all remaining drives in the set and the parity
drive.
RAID vendors typically include cache on-board the RAID controller to
increase
performance. Note that the parity disk in RAID 3 can be a performance
bottleneck, which is why most RAID vendors go to RAID 5.
RAID 3 is useful for Oracle data files, but not for redo log files.
2.1.5 RAID 5: Striping with Rotating Parity
RAID 5 has similar performance and redundancy characteristics as RAID 3,
but
the parity information is spread across all drives which eliminates the
parity
drive as a bottleneck.
RAID 5 is useful for Oracle data files, but not for redo log files.
2.1.5.1 Summary: Oracle7 and RAID Levels
This is a summary of the various RAID levels and their use with
Oracle7. The
numbers in parenthesis refer to the notes that follow the table.
RAID Type of RAID Control Database Redo Log Archive Log
File File File File 0 Striping avoid OK avoid avoid 1 Shadowing recommended OK recommendedrecommended
0+1 Striping + OK recommended avoid avoid
Shadowing (1) 3 Striping w/ OK OK avoid avoid Static Parity 5 Striping w/ Round- OK recommended avoid avoid robin Parity (2)
Notes:
This section is meant to give perspective on disk caching products and
the
Oracle RDBMS. It covers both hardware-based (ESE20 solid state disk
drive,
HSZ40 disk controller, Prestoserve disk caching memory module, etc.) and
software-based (UFS, AdvFS, VIOC, I/O Express, RAM disk, etc.) caching products.
3.1 Terms and Basics
In general terms, a cache:
is some small amount of expensive storage
replicates selected portions of some larger, cheaper storage
provides better (faster) access to the stored contents.
For example, a small paint can could be considered a cache compared to a
5
gallon drum of paint. While we could walk back to the 5 gallon paint
drum
between brush strokes, it makes more sense to carry a small can of paint
with
us for quick, easy access to the paint.
In computing terms, a cache attempts to provide fast access to data that
is
held on slow, cheap media by moving the actively used portions of it to
faster,
more costly media. For us, the slow, cheap media is disk drives and the
faster,
more costly media is memory. So a cache is some amount of memory that
is used
to hold the selected contents of a disk drive so that the CPU has
quicker
access to the information.
3.2 Types of Disk I/O Caching
Generally, there are two types of caching: write-through and
write-back.
These types are differentiated by the policies used to maintain them.
Both types of caching treat read I/Os the same. During a read, the
memory
cache is checked first to see if the needed data is there. If it is
not, the
read completes from disk, and if appropriate, a copy is saved in the
cache to
save the disk I/O on subsequent reads. Performance characteristics for
write-
back and write-through cached reads are the same too. If the read is
able to
complete from cache, the read will be fast. If the read has to go to
disk, the
read will be slow.
The difference between write-through and write-back caching is how they
handle
data writes. Both methods will write the data to the cache and the
disk. In
write-through caching, the write is not considered complete until the
data
makes it to the disk. In a write-back caching, a write is complete when
the
data makes it to the cache . This difference has both performance and
reliability implications.
Write-back caching performs faster than write-through caching. Because
the
write-through cache write has to go to disk, it completes at disk
speeds.
Since the write-back cache write completes when the data gets to the
cache, it
completes at memory speeds. The increased write performance of
write-back
caching comes at a price though.
Write-back caching has vulnerabilities to system failures that
write-through
caching does not have. Write-back caching is dependent upon memory.
Memory is
not persistent storage, i.e., when it loses power, it forgets
everything. So,
writes that an application was told were complete may not actually
complete if
the system crashes before the write-back cache has a chance to dump its
contents back to disk. This could leave the applications data in an
inconsistent state.
Writes Write-Through Write-Back Complete when data gets to disk data gets to cache Write speed slow - have to wait for disk fast - memory-to-memory copy Vulnerability none - writes to disk are high - writes to memory persistent aren't persistent
There are a number of variations on write-through and write-back
caching, most
notably, write-behind caching. Write-behind caching behaves like
write-back
caching (with the same dangers), but with a time guarantee: writes will
get to
disk within N seconds after the write gets to the cache. This is an
interesting twist to write-back caching in that it reduces the window of
exposure, but the exposure is still there. OpenVMS' Spiralog sports
write-
behind caching (discussed later).
3.3 Oracle as a Disk I/O Caching Product
This may be insulting to the Oracle7 developers, but it's true: Oracle7
is a
fancy disk caching product that happens to understand SQL. The cache is
the
buffer cache portion of the SGA. The portion of the disk being cached
is the
database files. Like any caching product, Oracle7 is trying to provide
fast
access to data that is held on slow, cheap media (disk drives) by moving
the
actively used portions of it to faster, more costly media (memory).
The Oracle7 buffer cache is maintained with a write-back algorithm. A
read
will be satisfied by the cache if possible. If the data is not in the
cache,
the read will be directed to disk and the results will be saved in the
cache.
Writes change the block in the cache; they do not immediately go to
disk.
Recall that while write-back caching has great performance
characteristics on
writes, it also has reliability concerns during failures. To ensure
that no
writes to the database blocks are lost, a redo log is maintained. The
write to
the redo log includes a list of all database block changes and must
occur
before commit is returned to the database user. A redo log write is
faster
than a random disk write since it is a spiral write (i.e., no other disk
activity should be on the redo log disk). The combination of a
write-back
algorithm and redo logging provides Oracle7 the fastest possible
performance
while maintaining complete data integrity and recoverability.
Note one of the differences between Oracle7 and other disk caching
products.
Disk caching products allocate physical memory from the system for their
cache.
Oracle7 does not. It allocates the SGA from virtual memory (in order
to
maintain portability across platforms among other reasons). It is up to
the
DBA and system administrator to ensure that there is enough physical
memoryavailable on the system so that the operating system does not have
to
page or swap Oracle7's cache. The idea of paging a cache is
self-defeating
(think about the purpose of a cache again, then think of the
consequences of
paging the Oracle7 buffer cache to disk). Further discussion of this is
outside the scope of this paper and is better left to Oracle7 database
tuning
guides.
3.4 Using Oracle7 with Disk I/O Caching Products
Disk caching services are provided by operating systems, and separate
disk
caching products are available from 3rd party vendors targeted for
I/O-intense
environments. For OpenVMS, Digital has VIOC (Virtual I/O Cache) and
Executive
Software makes I/O Express. Digital UNIX has Prestoserve, AdvFS, and
UFS.
These products have good performance records and are based on well known
technology.
This leads to the question, "How does Oracle (a disk caching product of
sorts)
behave when used with other disk caching products?" There are several
issues
that arise: reliability, memory waste, and performance.
3.4.1 Reliability
If you want the performance improvements of a disk caching product, it
is
important to understand their reliability characteristics. Using
unprotected
write-back caching with Oracle7 will probably lead to database
corruption if a
system failure occurs. Write-through caching does not cause these
reliability
problems. First we will see how these corruptions can happen in general
terms,
then we will further define protected vs. unprotected.
Write-back cached database files. Oracle knows at all times where the
current
copy of a database block is: either it is in the buffer cache or it is
in the
database file (we will ignore Oracle Parallel Server for now, but the
same
argument holds). Even when the current copy of a database block is in
the
buffer cache, Oracle knows how stale the disk block is and how much
information
it needs to keep in order to bring the stale block on disk up to date
again in
case of a system failure. When write-back caching is used on database
disks
and a system failure occurs, it is possible that Oracle's recovery
mechanism
will find a disk database block to be more stale than expected, and have
insufficient information to bring the database block up-to-date.
Write-back cached redo log files. When Oracle says a transaction has
been
committed, this really means that Oracle has written the transaction's
redo to
a persistent store -- the redo log file -- so that if the system
crashes,
Oracle can regenerate the transaction. When write-back caching is used
on redo
log files, this redo log write is no longer persistent. During recovery
from a
system failure, it is possible that Oracle will not recover transactions
that
it said were committed before the failure.
How can these corruptions happen? In essence, Oracle has no idea that
disk
caching software is running underneath it. Most disk caching products
are
implemented as device drivers or disk controllers and fool the layers of
hardware or software above themselves into thinking the I/O is really
done.
Oracle optimizes both the amount of disk I/O it does and the amount of
information it keeps around to bring stale disk blocks up-to-date.
Oracle
depends on knowing what is on disk. If the caching software does not
get the
data to disk eventually, Oracle cannot recover from system failures.
The distinction between protected and unprotected write-back caching is
as
follows. Protected write-back caching ensures that the cached I/O will
eventually get to the disk drive. Protected write-back caching is
typically
battery-backed memory implemented as an I/O controller (HSZ40) or as a
separate
memory module (Prestoserve). Unprotected write-back caching simply uses
the
computer's physical memory to implement the cache with no way to
guarantee that
the I/O will get to disk in case of system failure.
Caution is in order even when using protected write-back caching. In
case of
system failure, protection against database corruption is only as good
as the
battery that is keeping the write-back cache warm. Make sure the
write-back
cache hardware can complete the I/O's it said it would, especially after
a
system failure. Oracle cannot be held liable for database corruptions
due to
write-back cache hardware problems.
3.4.2 Memory Waste
For software-based cache products, one problem that arises from
combining them
with Oracle7 is that Oracle data may be doubly cached, once by Oracle
and once
by the disk I/O caching product. In the worst case, the user gets no
performance win from the disk I/O caching product and lots of memory is
wasted
by storing the same information twice. Some cache products recognize
this
situation and allow the DBA or system administrator to disable disk I/O
caching
by the product for selected files if desired. OpenVMS' Spiralog and
other 3rd
party products provide this selectivity. Unfortunately, the current
UNIX
filesystems, UFS and AdvFS, don't.
Memory waste is not an issue with hardware-based caching solutions since
they
do not use system memory.
3.4.3 Performance Expectations
It is important to understand how the DBWR and LGWR algorithms work in
order to
understand the affect disk caching products can have on Oracle's
performance.
Beyond that, the larger needs of the system will determine which kind of
caching product, if any, should be used to attain optimal performance.
We
start first by restricting our view to Oracle DBWR and LGWR algorithm
performance with caching, then we will take the broader system-wide
view.
3.4.3.1 Oracle DBWR and Disk I/O Caching
There are two notable problems with using disk caching with Oracle7's
current
DBWR algorithm. First is that caching disk writes may not necessarily
make
Oracle run any faster. Second, if caching does succeed in making DBWR
run
faster, it may actually slow down users' ability to get read work done.
DBWR performs parallel, synchronous writes of batches of blocks at a
time,
referred to as a write batch. DBWR issues a batch of I/Os (asynchronous
I/Os
for both OpenVMS and Digital UNIX), then waits for them all to complete
before
continuing processing. This means that the latency time before DBWR
begins
doing useful work again is as long as the longest I/O in the batch.
Because of
this, there is little reason to have a database disk farm with disks of
widely
differing latency times. In other words, there is no performance gain
to
having part of a database on a solid state disk (like the ESE20) and
another
part on a traditional disk (like an RZ28).
Perhaps a system-wide disk caching product is used, caching all DBWR
writes.
This simply means that DBWR is able to get back to the work of cleaning
the
Oracle7 SGA buffer cache more quickly. Unless keeping the SGA clean is
a
problem (and it rarely is), DBWR could be wasting processing time
keeping the
cache too clean.
3.4.3.2 Oracle7 LGWR and Disk I/O Caching
The problem with caching LGWR writes is similar to the previous problem
mentioned with DBWR: LGWR is able to get back to work too quickly. The
LGWR
design writes out batches of redo at a time, and uses the log write I/O
latency
as a natural gating factor to determine the batch size. This design
allows the
LGWR algorithm to degrade gracefully under load. As LGWR I/O latency
becomes
smaller, so does its batching factor. In the worst case, LGWR is doing
a
separate write for each transaction on the system, causing the LGWR code
path
executed per transaction to skyrocket. It is possible (and we have seen
it)
for LGWR to fire continuously when writing to a cached redo log file,
consuming
an entire CPU in an SMP system.
3.4.3.3 System-Wide Performance
The performance improvement attributable to a caching product is highly dependent upon a number of factors:
whether the product is hardware or software based
whether the product is doing write-back or write-through caching
the size of the cache
the read vs. write mix on the system (heavy write would favour write-back caching, heavy read favors write-through)
the locality of reference of the I/Os
the performance requirements for different applications on the same system
In the end, whether or not a situation calls for disk caching software
depends
on the performance needs and the situation itself.
Assume we have a system where Oracle is the only performance-critical
application on the system. It does not make sense to use software-based
disk
I/O caching. Any physical memory that we would have used on the caching
software could be better utilized by Oracle7. For additional
performance, we
could consider adding more memory to the system (and giving it to
Oracle7) or
perhaps using controller-based, protected write-back caching.Now let's
assume
we have a system where both Oracle and another
I/O-intense application share the label "performance-critical". We
might
consider a software-based, write-through disk I/O caching product for
this
situation. If the disk caching product can be told to cache only the
I/Ointense
application, then we will have two tuning "knobs" that we can
use to
fine-tune the applications' performance -- one the size of the Oracle
buffer
cache, the other the size of the caching product's cache. If the disk
caching
product does not have this selectivity, then we might want to give
enough
memory to it to make both applications run well and use a very small
Oracle
buffer cache.
3.5 Summary: Oracle7 and Disk I/O Caching Products
This is a summary of the use of disk caching products with Oracle7.
Numbers in
parenthesis refer to the notes following the table.
Type of Caching Control Database Redo Log Archive Log
File File File File Write-Through OK OK avoid avoid Write-Back, never never never neverUnprotected
Write-Back, OK (1) OK (1) avoid (2) avoid Protected
Notes:
caching, test the cache's ability to recover from system failures
before relying upon it in production systems. 2. Write-back caching could cause the LGWR to work too hard and consume
an entire CPU.
4. Digital Unix-Specific Topics
4.1 LSM
4.2 Raw vs. Filesystems
4.2.1 Raw 4.2.2 UFS 4.2.3 AdvFS
5. OpenVMS-Specific Topics
5.1 The Spiralog Filesystem
This section discusses Digital's new Spiralog filesystem and its use
with
Oracle7. Spiralog has also been known as "The New Filesystem" and
"Dollar"
(its code name).
5.1.1 What is Spiralog?
Spiralog is a log-structured file system that Digital is selling as a
high-
performance alternative to OpenVMS' standard Files-11 filesystem.
Spiralog has
the following features and characteristics:
files live on the volume end-to-end with each other
reads are cached in a really large O/S memory cache and support read-
ahead caching, so little disk head movement is expected due to reads
writes to files (only the changed blocks) are written to the end of the
volume resulting in fast write times (due to spiral writes)
supports both write-through and write-behind (write-back with to-disk
guarantee within 30 seconds) caching
incremental backups are lightening fast since all changes are congregated at the end of the volume
a nightly defragmentation utility re-coalesces free space to the end of
the volume
5.1.2 Spiralog and Oracle7
We have done a little testing of Oracle7 with Spiralog. These are the conclusions so far:
Oracle7 works on Spiralog, with restrictions...
do not use write-behind caching on redo log files
do not use write-behind caching on files that contain rollback segments, or the system tablespace (since it contains a rollback segment); if used, this will result in a database that thinks it needs
recovery and doesn't need recovery at the same time, and an ORA-600 when the affected data is accessed
can use write-behind caching on other data files
no major transactions-per-second performance improvement
affect on incremental backup time expected to be good but not tested
yet
We recommend staying away from Spiralog unless incremental backup time
is of
utmost importance.
remove nospam to email me Alain Barrette wrote:
> Hi,
>
> Any informations, ressources, pointers regarding Oracle7 and a RAID
> level5 configuration (or any other level for that matter) ?
>
> Specifically, performances improvements, i/o distribution across
> multiple disk, pro and con, etc...
>
> Some stats would be welcome too...
--Received on Wed Apr 15 1998 - 00:00:00 CDT
![]() |
![]() |