Oracle-L: Re: T3's, NetApps, Tuning, Wife's Opinion and other fun

From: Ray Stell <stellr_at_cns.vt.edu>
Date: Tue, 13 Aug 2002 07:38:29 -0800
Message-ID: <F001.004B2DA5.20020813073829@fatcity.com>

Thanks for sharing T3 experiences, I'll have to go down this road soon, very helpful. I can spot a tall tale when I hear one, however, keeping your opinions, when pigs fly!

On Mon, Aug 12, 2002 at 06:53:20PM -0800, Dave Morgan wrote:
> Hi All,
> Well after 6 wekks of testing here is the basic way
> to operate SUN T3's as efficiently as possible.
>
> Be prepared for arguments with High priests from the cult of SAME.
>
> SUN T3's are fiber attached hardware RAID 5 arrays with
> a "modern cache". The hardware engineers argue that if you need more
> I/Os/sec just add another array as a "concatenated volume". The theory
> being the hardware is intelligent enough to use the cache to increase
> throughput. It actually works as they claim. Never did explain why
> it wasn't a single point of failure in the end though.
>
> My hardware was 3 4810's, each with 4 - 8 cpus, each with 4 - 8GB,
> 2 - 4 bricks per machine
>
> First insist that multiple bricks be mounted on at least 2 mount
> points.
> (D2 and D3). DO NOT USE the forcedirectio option. I don't know why but
> I have been unable to take less than a 40% throughput hit with it
> turned on. And I don't care what other people say, no matter how
> much respect I have for them
>
> Insist on at least one JBOD for oracle binaries and configs
> Insist on at least one JBOD for redo logs (D1)
>
> This a bare minimum.
>
> One set of redo on D1
> One set of redo on D2
> Archive logs, Rollback and Temp on D3
> All data files where needed on D2
>
>
> Next Level up
>
> Add another JBOD for redo and move redo on to it
> Move Rollback and Temp to D2
>
> At this point to get more throughput you have to take the
> JBOD to raw devices. Or try forcedirectio on these devices :)
>
> If even better performance is needed, more JBOD, for rollback and redo.
> If more disk spaces is needed, get another brick.
>
> Which leads me to the recent discussion on "proper way to tune"
>
> Huh????? Why make it so complex?
>
> Tuning from a blue collar DBA perspective:
>
> Assess the machine first
>
> No matter what your ratios or what your waiting for:
>
> sar to see if the machine is ever pinned
> vmstat to see your queues and paging
> iostat to see disk activity
> top at timed intervals to catch rogue jobs
>
> read your logs and config files
>
> Then talk to the users
> Is the "system slow" or is it specific jobs?"
>
> log on run ratio reports and query v$system_event
>
> Any ratio that is out of range needs to be tuned:
>
> Especially disk sorts to memory sorts
>
> For the infamous buffer cache ratio:
> < 10% throw memory at it
> > 97% take memory away
>
> For wait states here's a quick drive through for those who look at
> the number and say "Yeah but what do they mean"
>
> Time Wait Total Time Average
> Event # Waits Timeout In
> Hndrds Time
> -------------------------- --------- ------- ----------------------
> SQL*Net more data to client ######### 0
> 680421 .005
> SQL*Net message to client ######### 0 17590
> 0.000
> SQL*Net message from client ######### 0 3953399703
> 35.511
>
> - These are all communication to the client. ignore
>
> db file sequential read 39562523 0
> 12300885 .311
>
> - Data read, 0.0003 seconds average wait, ignore. This number will climb
> if
> - IO is bottlenecked or inappropriate (ie using FTS for joins)
>
> rdbms ipc message 12440441 ####### 2774129387
> 222.993
>
> - Internal machine communication ignore
>
> db file scattered read 12264223 0
> 6202885 .506
>
> - Data read, 0.0005 seconds average wait, ignore. This is higher due to
> type of read.
> - Increase in this time indicates an IO bottleneck
>
> log file parallel write 4724477 67
> 2212249 .468
> log file sequential read 2097709 0
> 1712615 .816
>
> - Redo logs, with 2 pure raw JBOD I have got this down to about 0.25
> hundredths
>
> buffer busy waits 1548548 0
> 408235 .264
>
> - Memory latch contention 0.0002 seconds ignore
> - If Timeout or average increase, need to determine why contention is
> increasing
>
> control file parallel write 669234 0
> 376491 .563
> pmon timer 662092 662074 203382329
> 307.181
>
> - Internal waits ignore
>
> direct path read 573442 0
> 423920 .739
>
> - Reads from tempfiles (sorting). Each segment is 10M in this db so
> ignore.
>
> log file sync 551716 15
> 459036 .832
>
> - See redo above.
>
> db file parallel write 201166 0 610793
> 3.036
>
> - writing updates and inserts 0.003 seconds ignore
>
> undo segment extension 100516 100507 27
> 0.000
>
> - Don't know what this is, hope it's not critical ;)
>
> SQL*Net break/reset to client 92904 0
> 9522 .102
>
> - Client communication ignore
>
> LGWR wait for redo copy 92844 7
> 736 .008
>
> - Affected by the archiver not keeping up. The alert log and an infamous
> ratio
> - are better ways of detecting this.
>
> file open 76910 0
> 1874 .024
>
> - note the number of occurances is getting very small, average time is
> low, ignore
>
> direct path write 69706 0 1408596
> 20.208
>
> - writes for sorts, even though average time is high, research indicates
> that
> - the client does not wait for this, so internal ignore
>
>
> SQL*Net message to dblink 48680 0 7
> 0.000
> SQL*Net message from dblink 48680 0 108414
> 2.227
>
> - network or machine dependant ignore
>
> control file sequential read 45198 0
> 31664 .701
> latch free 44849 30305
> 28693 .640
>
> - Memory latch contention, notice rate of Timeout to # of waits
> - If average time increases, need to determine why contention is
> increasing
> - QUICKLY
>
> SQL*Net more data from client 43851 0 229528
> 5.234
> enqueue 19946 19380 5973595
> 299.488
>
> - I used to worry but ...., nothing ever happened so ignore
>
> - And the rest happen so infrequently ignore
> file identify 6961 0
> 496 .071
> smon timer 6639 6612 203366288
> 30632.066
> log file single write 2770 0 2998
> 1.082
> rdbms ipc reply 1290 0 2302
> 1.784
> log buffer space 1104 1 5507
> 4.988
> db file single write 924 0
> 162 .175
> log file switch completion 743 8 17454
> 23.491
> refresh controlfile command 722 0
> 413 .572
> pipe get 377 213 81693
> 216.692
> library cache pin 316 0
> 152 .481
> control file single write 231 0
> 164 .710
> library cache load lock 72 6 877
> 26.069
> single-task message 68 0
> 49 .721
> switch logfile command 45 0 815
> 18.111
> process startup 16 0 94
> 5.875
> SQL*Net more data from dblink 16 0 0
> 0.000
> row cache lock 5 0 0
> 0.000
> db file parallel read 3 0 7
> 2.333
> instance state change 2 0 0
> 0.000
> Null event 1 1 410
> 410.000
> reliable message 1 0 0
> 0.000
> sort segment request 1 1 103
> 103.000
>
>
> And the more you study your database the more you will understand of the
> above :)
>
> After you are aware of your systems problems, fix your config files and
> file positions and then chase down SQL issues.
>
> >From your users, capture the SQL run explain plan
> Run top, catch processes that use a full cpu for more than 30 seconds
> Capture the sql, run explain plan
>
>
>
> I have always ogled women. When I got married, (well started going out)
> I explained to my wife that I was making sure I had the best.
>
> But really, she's a good wife,
> I'm even allowed to have opinions. If she approves of them I'm allowed
> to have them :)
>
>
> The digest hit 983K on Friday, if I'm kind, 100K was content.
>
> >From the titles I see that there were
> performance problems with partitioned tables and bitmap indices?
>
> I can't help those who won't help themselves.
> And I don't receive HTML email.
>
> TTFN
>
> Off to figure out the relationship between multiblock_read_count and
> those index_optimizer thingies
>
> Dave
>
> --
> Dave Morgan
> dvmrgn_at_telusplanet.net
> 403 399 2442
> --
> Please see the official ORACLE-L FAQ: http://www.orafaq.com
> --
> Author: Dave Morgan
> INET: dvmrgn_at_telusplanet.net
>
> Fat City Network Services -- (858) 538-5051 FAX: (858) 538-5051
> San Diego, California -- Public Internet access / Mailing Lists
> --------------------------------------------------------------------
> To REMOVE yourself from this mailing list, send an E-Mail message
> to: ListGuru_at_fatcity.com (note EXACT spelling of 'ListGuru') and in
> the message BODY, include a line containing: UNSUB ORACLE-L
> (or the name of mailing list you want to be removed from). You may
> also send the HELP command for other information (like subscribing).

-- 
===============================================================
Ray Stell   stellr_at_vt.edu     (540) 231-4109     KE4TJC    28^D
-- 
Please see the official ORACLE-L FAQ: http://www.orafaq.com
-- 
Author: Ray Stell
  INET: stellr_at_cns.vt.edu

Fat City Network Services    -- (858) 538-5051  FAX: (858) 538-5051
San Diego, California        -- Public Internet access / Mailing Lists
--------------------------------------------------------------------
To REMOVE yourself from this mailing list, send an E-Mail message
to: ListGuru_at_fatcity.com (note EXACT spelling of 'ListGuru') and in
the message BODY, include a line containing: UNSUB ORACLE-L
(or the name of mailing list you want to be removed from).  You may
also send the HELP command for other information (like subscribing).

Received on Tue Aug 13 2002 - 10:38:29 CDT