Re: 10g System statistics - single and multi

From: Christo Kutrovsky <kutrovsky.oracle_at_gmail.com>
Date: Wed, 18 May 2005 14:25:21 -0400
Message-ID: <52a152eb05051811256b9b58da@mail.gmail.com>

Wolfgang,

I will have to repeat this test on my system. What was your OS and file system ? linux and ext3 does not have directio support. It also has poor default read-ahead parameters.

"...on average it should take the same amount of time to position ..." If you were to do 1 x dfmrc randomly, then yes mread would always be > sread. But you are doing this sequencially. Thus only the 1st read would involve positioning the heads, after that, every subsequent read would not include that time. Every so often, there would be some time to move the head to next track, but this time is far less then a full seek time. That is, of course, assuming no other disk activity. Or minor activity.

Unfortunelly the test 10g system I have is not yet on the SAN i am testing.

I am using RedHat linux and ASM (i.e. using directio)

These results have been produced with Windows (for convenience) on unpartitioned drives with iometer (www.iometer.org). No caching on OS side.

Random read from my SAN
Test type Responce time (ms)

512 read-1=090.874
512 read-2=090.173
512 read-4=090.130
8k read-1=090.457

8k read-2=090.149
8k read-4=090.228

32k read-1=090.422
32k read-2=090.388
32k read-4=090.762
256k read-1=092.165

256k read-2=092.672
256k read-4=095.185

I dont have the 512K reads saved. The number after the test is outstanding IOs. (read async or multiple sessions active).

And the random values are:

512 read RAND-1=09252.949
512 read RAND-2=0979.780
512 read RAND-4=097.537
8k read RAND-1=096.376

8k read RAND-2=096.978
8k read RAND-4=098.375

32k read RAND-1=097.193
32k read RAND-2=098.399
32k read RAND-4=0911.864
256k read RAND-1=091.331

256k read RAND-2=0915.278
256k read RAND-4=0927.275

Yes, the SAN has cache, and I've seen the effect of the cache. The way this test works is it starts the test, runs for ~10sec and then starts recording the results over 60 seconds.

I've also retested with the cache disabled. The effect is that read speed drops whith 1 outstanding io, but quickly reaches the non-cached speeds with multiple outstanding IOs.

I dont fully understand the numbers from your test.

Say 1st line:
dfmrc =3D 1
ELA for single multiblock (1 in this case) read =3D 539 micro seconds ? Or total ELA time for your test ?

On 5/18/05, Wolfgang Breitling <breitliw_at_centrexcc.com> wrote:
> I would say that concluding from your example that "in all modern SANs,
> unless your dfmbrc is such that you will read > 512 Kb, your mread will
> be lower then sread" is a rather bold statement.

>=20

> Excluding caching, mreadtm should always be higher than sreadtm since on
> average it should take the same amount of time to position the head and
> wait for the rotational delay until the data shows up under the head.
> But it takes longer to transmit nK of data than mK when n > m

>=20

> I did a quick test, setting db_file_multiblock_read_count to 1, 2, 8,
> 16, 32, 64, 128, and 256 for a table on an 8K blocksize LMT with uniform
> 4M extends stored on an IBM ESS (Shark) 700. These are the numbers from
> the ELA of the extended 10046 trace sequential and scattered read wait
> events:

>=20


> 1       539.095

> 2       682.760

> 3       795.782

> 4       911.000

> 6       1066.778

> 7       1171.429

> 8       1274.440

> 9       1824.500

> 10      1912.000

> 11      1994.800

> 15      2569.000

> 16      2812.132

> 25      3794.500

> 26      3880.000

> 31      4688.000

> 32      4790.857

> 36      5218.000

> 38      5260.000

> 40      5332.667

> 56      7578.000

> 57      7565.833

> 64      8454.308

> 102     12553.500

> 108     13349.000

> 128     15635.545

>=20

> I failed to clear the buffer between reads which is why some "odd"
> counts show up that to not coincide with any of the dfmrc settings. But
> in general, with the exception of multiblock reads 56 and 57, more
> blocks take longer to read than fewer, and thus mreadtm should be higher
> than sreadtm.

>=20

> If system statistics are gathered over a long enough representative
> workload, mreadtm should definitely come out higher than sreadtm. If
> mreadtm is consistently less than sreadtm then I would investigate why
> that is.

>=20
>=20

> Christo Kutrovsky wrote:
> > I've profiled my SAN. IBM FastT 700
> >
> > Stripe size plays very little in sequencial or random IO. Actually
> > larger stripe size is a bit better.
> >
> > Sequencial reads at sizes between 512 bytes to 128 Kb are under 1 ms.
> > Compared to random IO been always in the 6ms range.
> >
> > So in all modern SANs, unless your dfmbrc is such that you will read >
> > 512 Kb, your mread will be lower then sread.
> >
> > P.S.
> > Not sure why you send this to me only, and not to the list.
> >
> > On 5/17/05, Wolfgang Breitling <breitliw_at_centrexcc.com> wrote:
> >
> >>Actually, depending on your SAN, it could just as easily be reverse. If=
you
> >>have a large db_file_multiblock_read_cound (I always refer to it as dfm=
rc,
> >>taking the initials of each word) the SAN microcode could very well det=
ect
> >>a sequential read pattern and prefetch the next chunk so that cumulativ=
ely
> >>the average multiblock read count comes out very fast because later rea=
ds
> >>are serviced from the cache and do no real physical IO, wheras if you l=
eaf
> >>dfmrc at a moderate value of say 32, it may be below the prefetch radar=
.
> >>Christian Antognini has an interesting chart on the relationship betwee=
n
> >>dfmrc and IO time on different systems. Unfortunately there is not data
> >>about the different storage architectures on those systems.
> >>If prefetch is not a factor, stripe size can come into the equation. If
> >>dfmrc is greater than the stripe size, the average IO time goes up
> >>depending on the # of physical disks involved. The IO rate is spread mo=
re
> >>evenly, avoiding hot disks, but a single large IO request can get slowe=
r.
> >>

>=20

> --
> Regards

>=20

> Wolfgang Breitling
> Centrex Consulting Corporation
> www.centrexcc.com

>=20

--=20
Christo Kutrovsky
Database/System Administrator
The Pythian Group

--
http://www.freelists.org/webpage/oracle-l

Received on Wed May 18 2005 - 14:30:00 CDT