Re: server sizing [NT: need XEON?]

From: Paul Drake <paled_at_home.com>
Date: Wed, 18 Apr 2001 02:09:29 -0700
Message-ID: <F001.002EB8FF.20010418003526@fatcity.com>

"Eric D. Pierce" wrote:
> If you can get ahold of him, or can wait until he
> isn't so busy, Paul Drake of this list is the resident RAID/hardware
> expert.

Eric - this is one of the funniest pages I have seen: http://www.ultratech-llc.com/Personal/Files/?File=~MoreInfo.TXT

uh oh. I don't know if I'm honored, or scared. I'm not nearly the resident RAID expert - but would like to be someday. These 4-drive JBOD workstations at home don't quite give me the room to play.
Does the word "Heuristics" mean anything to you? I'd rather have stats than guidelines.

Here is my one word of caution before heading off into some other direction:

multiple database writer processes are not supported for Oracle 8.1.7 on NT.
This is straight off of the platform-specific docs off of the Docs CDROM.
I did receive an ORA-00600 error relating to running multiple db_writers on NT with 8.1.6.
Don't do it.

>From Metalink:

Subject: Re : Db Writer Process

Oracle on NT only allows/needs a single database writer process (DBWR).

Multiple DBWRs on UNIX is not multiple "real" DBWRs which all go scan for dirty buffers and write them to disk. It's really just one "real" DBWR and some I/O slaves. The "real" DBWR tells the slave DBWRs to do I/O.

On NT, there's really no need for this since async I/O and NT will take care of all that for you. NT acts as the I/O slaves and the "real" DBWR [1 DBWR thread] then checks the "slaves" to see if the I/O is done.

Melissa Holman
Oracle Support

Will this produce the rate-limiting-factor in your system? I do not know. *Something* will be a rate limiting factor for a particular process.
But as OLTP users' transactions vary so much from the batch processes, what exactly is the overall rate_limiting_factor is tough to say - your mileage will vary.

My background is in Chemical Engineering. I spent some time in R & D wearing a white labcoat.
Chemical Engineering - Process Debottlenecking - is all about "Where is the bottleneck?"
Usually, a 15-20% margin is designed in such that the plant can run safely over its design spec without a retrofit of key components. Beyond that - you have to find what component needs to be increased to add capacity - safely.

For your system - design in more capacity that you need, with room for expansion.
Empty drive bays in external storage cabinets are good. The rebuilds of drive arrays are painful - but if you had a 7 bay drive cage (half a 14 bay - e.g. Storageworks 4200) with only a 4 drive array, hiking it up to a 6 drive RAID 0+1 (I'd rather duplex these) with a hot spare is tolerable. On an Ultra 160/m channel, 6 drives is reasonable.

Here is a baseline - I/Os per second.
A standard drive can accommodate 80-100 I/Os per second. Larger bufffers (cache) on the hard drive can increase that number, as can read-ahead caching algorithms on the RAID controllers. But if the reads are scattered, as in index reads and nested loops - the cache hits are not very likely.
Lots of memory reduces the amount of read I/O required for a query. It does not reduce the amount of memory bandwidth required, nor does it completely reduce the overhead of creating consistent reads (reading rollback segs to provide cr blocks). This is a large part of the non-linear nature of scaling - interference of user processes creating additional overhead.

When you say transactions per minute - these need to be translated out to actual logical and physical I/Os.
Best bet here is to vary the number of sessions running in a scripted mode, and benchmark the I/Os required per transaction.

So where would your bottleneck be?
LGWR? DBWR? ARCH? Will your checkpoints really hurt with only a single DBWR?
Will it be in the network I/O, Storage subsystem, memory bandwidth or just the capacity of the PCI bus channels? Definitely go for the 64-bit, 66 MHz RAID controllers, as many PCI bus channels and memory controllers as you can find.

I can tell you this from experience:

when the CPUs are I/O-bound, they do not appear as active in NT Task Manager.

try this out:

perform an export of a schema with datafiles, indexes, temp and the dump file all on one drive.
then perform an export of the same schema with all of the above files on separate drives.
I saw the average CPU utilization INCREASE from 20% to 80% average in performing this back on 7.3.
Did the export utility crush the CPU? No. The I/O bottleneck (disc) was somewhat removed, and the CPU was free to perform useful work. The overall time of execution of the export dropped - but I don't have the scaling factors around.

I've seen an NT box (Compaq Proliant 7000) run (4 CPUs, 26 hard drives, 7 I/O channels, 3.2 GB RAM) that averages a CPU utilization of around 85% under full load. But its executing user tasks much more quickly than a comperable dual CPU box. What this config tells you is - with sufficient available I/O - the CPUs will attempt to run at 100% utilization provided there are requests in the queue.

You do have some flexibility available in terms of Processor:Thread affinity. Note:108512.1
This is only possibly beneficial when cache warmth and overhead of context switching are significant factors. If LGWR is the usual bottleneck, you might set aside on CPU exclusively for that thread (at least for benchmarking). Metalink has a doc describing the registry settings required for this configuration.

I want to get back in the lab and provide real performance data, with maybe a conclusion or two.

That's as good as I can do at 3 AM.
I'll see what kind of I/O stats I can get you for that big boy. Its running DSS, no OLTP though.

Check out
OPTIMIZING RAID PERFORMANCE FOR ORACLE RDBMS ON WINDOWS NT 4.0/2000 on Metalink - Note:97597.1
http://metalink.oracle.com/metalink/plsql/ml2_documents.showDocument?p_database_id=NOT&p_id=97597.1

I was surprised to see the following, as I was using a block size of 8192 bytes (matching db_block_size):
Use 16 KB allocation size for formatting the NTFS volumes (format <drive>: /fs:ntfs /A:16K).
Increase the NTFS log file size to 64 MB for large volumes (chkdsk /L:65536).

hth,

Paul

-- 
Please see the official ORACLE-L FAQ: http://www.orafaq.com
-- 
Author: Paul Drake
  INET: paled_at_home.com

Fat City Network Services    -- (858) 538-5051  FAX: (858) 538-5051
San Diego, California        -- Public Internet access / Mailing Lists
--------------------------------------------------------------------
To REMOVE yourself from this mailing list, send an E-Mail message
to: ListGuru_at_fatcity.com (note EXACT spelling of 'ListGuru') and in
the message BODY, include a line containing: UNSUB ORACLE-L
(or the name of mailing list you want to be removed from).  You may
also send the HELP command for other information (like subscribing).

Received on Wed Apr 18 2001 - 04:09:29 CDT