FW: How would you layout the files?
Date: Tue, 9 Dec 2008 11:23:07 -0500
Snipped to transmit. It apparently bounced the first time.
From: Mark W. Farnham [mailto:mwf_at_rsiz.com]
Sent: Monday, December 08, 2008 7:38 PM
To: 'moabrivers_at_gmail.com'; 'oracle-l_at_freelists.org' Subject: RE: How would you layout the files?
What is the target of your backups?
What method do you use for backups?
What method would you use for recovery if you should need to recover?
What is your operational schedule?
Now I wouldn’t start to answer your question for a client of mine without knowing the answer to the above. However, I’ll take a stab at it:
- Your quoted sizes are large enough to duplex the rest of the
drives. The only question is whether you should stripe and duplex or duplex
two more independent pairs. Since the accepted protocol is three
independently secure control files, I guess I’d go with two more independent
duplex pair. With two groups the multi-user parallel initiation
opportunities probably wash with opportunity to need to visit a separate
disk for a given read. So this is effectively gating your throughput at the
write rate of a duplex write to a single 15K drive and reads to a pair
(presuming your plexing software isn’t braindead and balances the read load
to both plexes. That’s in favor of having the three independent control files.
- I’d think seriously about placing the online redo logs on the OS
drive (which is already duplex). At your quoted peak redo rate, that
shouldn’t fight with OS i/o much at all. Or get them to plug in enough flash
(duplexed, please) to handle the online redo – not for performance of the
redo writes, but to separate them from interrupting user operations other than commits. So lgwr won’t be fighting with dbwr and all the users. Of course 100 potential readers against effectively 2 drives will fight with each other plenty if they manage to all hit return on fresh disjoint queries near the same time. And of course you said no trying to tell the owners to improve the hardware. Still, enough duplexed flash to get you through a very long time at a peak rate of 100Kps should be pretty cheap and it should marginally improve the consistency of response time.
- Put arch on the other duplexed pair, and use it also for disk-to-disk backup. This is specifically tuned to your size and low throughput. It sacrifices read throughput for consistency of response. So when you’re backing up the arch and backup drive to tape or the network, those reads are only going to fight with your infrequent arch operation. In fact, during the possible tape job if you have enough online redo to get through the duration of the tape job, you might want to do an archive log stop so arch doesn’t slow your tape job down. Writing the used but not archived online logs to arch when you restart arch after the tape job is over should then run quite fast, since it routinely fights with nothing.
- If you’ve got big load jobs from external sources, you’ll probably want to stage them on the logical drive holding arch, one of the control files, and online backup.
The other viable option is stripe and duplex everything. That gives you fewer independent control file copies and if someone insists you have to have three you’ll be writing them to the same place whenever you write them. But you’ll have more iops for those 100 readers.
Basically the trade off in the choice is maximum throughput when conditions
are best (little lgwr, not currently running an arch, not currently doing an
external load, not currently running a backup) versus a slower peak
throughput in exchange for more consistent throughput when one or more of
the aforementioned things is going on. If you’re normally 24 hours a day
(notice I avoided saying 24x7 and causing dangerours belly laughs from
practitioners of high availability vis-à-vis this architecture) I’d lean more towards three independent disk pairs (OS and software and online redo
(disk 1), all your datafiles (disk 2), arch, online backup (disk 3). Hmm I
left out temp… Sigh. I guess it depends. Maybe I’d put a temp tablespace on each disk and make that a temporary disk group to share the pain across all disks. With 8GB of RAM for a 100GB database and only 100 users you can *probably* avoid sorts to disk for the most part, and the jobs will be “batch-like” when not, so violating the consistency of throughput idea in that case should not be visible to the users expecting consistent interactive throughput. If you’ve got a long non-prime shift so the backups etc. are not against interactive users then I’d lean more towards SADE
(stripe and duplex everything.)
All this is off the top of my head and it certainly may not fit with your job mix. Finally, I’d stuff your diagnostic directories either on the OS drive or the arch drive if you do that. If and when you want traces you don’t want i/o to them perturbing your throughput texture.
mwfReceived on Tue Dec 09 2008 - 10:23:07 CST