Re: deep-geek discussion: config for Sun E10K OPS w/mirrored A3500 Arrays?

From: Kristiaan J. Kolk <akolk_at_gelrevision.nl>
Date: Thu, 01 Jul 1999 16:05:30 +0200
Message-ID: <377B75AA.8569AB2F@gelrevision.nl>

I have put the answers after question:

>
>
> Assume that everything wil be on raw devices (except archived logs of
> course) since OPS is involved. The questions are:
>
> 1) Cache-trashing seems to be a potential issue here. Since the entire
> array has only 128 MB of cache, what are thoughts about caching only
> the redo logs? What about turning off all read cache and reserving it for
> writes only? Eh?

Turning of the write cache will help. The reads will influence the response time for the end user, not the writes.

>
>
> 2) Online redo logs will be ping-ponged between at least two drives,
> Peak is a 100MB log switch every 45-90 seconds, during less critical
> hours. Most critical hours see 100MB log switch every 2-5 minutes.
>
> a) Rule of thumb is a log switch every 15/20/<pick your religion> minutes,
> but are near-gigabyte redo logs even feasible? What are the largest
> you've used for ultra hot OLTP? What is your religion here?

If you use Oracle8, I would go vor larger log files and set db_block_max_dirty_target to a low value (that is any a good idea for OPS). This will cause the DBWR to write a steady stream of dirty buffers, and checkpoints will happen relatively quick. So a 1GB-2GB is not uncommon in large OLTP.

>
>
> b) Given the nature of redo logs, has anyone really seen significant
> benefit from striping them? (As in ping-ponging between two striped
> n-disk LUNs?)

This will help if the stripe size is small enough and the average write large enough. You may find that the write is a max of 256K so stripes of 32K will help here. If the average write size is small it won't help.

>
>
> 3) Assume that 4 of the 50 disks in each A3500 are dedicated to redo logs,
> That leaves 46 disks and 6 available LUNs. It could be set up as
> anything between "stripe everything across everything" ( a newer
> "start-up" religion) and 6 striped LUNs. For example, the latter might be
> 5 LUNS of 8-disk wide stipe sets plus one LUN of a 6-disk wide stripe set.
> Traditionally, I would have gone with something like this, but the appeal
> of "everything everywhere" is intriguing - especially since the I/O is so
> random. Experiences? Religion? Horror stories?

Religion and I use this with succes !

>
>
> 4) Now, to complicate matters, consider if you had three of the A3500 arrays
> with 50 disks each and one D5000 (?) array (100 MB/sec, but no cache).
> How would you split the redo and the various type of segments across the
> various arrays? Dedicate one A3500 array to redo logs?!!??!?!? (For
> caching efficiency primarily...)
>

For get about the cache (unless it is battery backed up) It sounds that your cache will fill quicky anyway and you will bottleneck on the disk anyway .....

>
> 5) Does the introduction of OPS change anything significant in your model?
> (Please! No obvious and generic "partitioning" warnings.)
>

The key is to make sure that the db_block_max_dirty_target is really low (like 1-3 percent if the cache, assuming that you use oracle8). Another thing that you need to make sure is that the write batch is small with OPS (like 256 or so). If all the files are stripped over all disks , it can help to increase the write batch. If you don't stripe enough, the write batch should be small.

>
> 6) What would be your optimal stripe size for each segment type?
> (Please specify whether per disk column or per stripe width.)
>

>
> I've been laying out Oracle servers for many, many years and I've been laying
> out fairly high-end Solaris Oracle servers for years, and I've been working
> with parallel server for a few years. But nothing quite like this! And
> I've never worked with the A3500 arrarys before.
>

Well I worked on the largest OPS system in the world: 40 TB, Sun, 2000 CPUs, 25000 TPM and this system has been up and running for two years now.We just completed a benchmark where we scaled the system to 3 times the size (120 TB) and 6 times the workload (150000 TPM) with a response time of less then 12 seconds (actually 9 seconds).

>
> I'd hate to miss anything with this one!
>

Well your system seems to be a challenge, but compared to the system that I worked on a breeze ;-)

>
> -- OraSaurus
> --
> -- Remove "not_" to reply...
> --
Received on Thu Jul 01 1999 - 09:05:30 CDT