Re: separate data/inidex

From: D.Y. <dyou98_at_aol.com>
Date: 28 Apr 2002 21:18:09 -0700
Message-ID: <f369a0eb.0204282018.181b2c47@posting.google.com>

Nuno Souto <nsouto_at_optushome.com.au.nospam> wrote in message news:<3cca102d$0$15475$afc38c87_at_news.optusnet.com.au>...
> In article <f369a0eb.0204260548.35235dc0_at_posting.google.com>, you said
> (and I quote):
>
> I'll jump in two with my $0.02 worth, given that not long ago I did the
> same on another thread on the same subject for the same reasons.
>
> > snippage
> > > >level, each I/O involves two types of activities:
> > > >1) moving electrons from disk head to memory. This is instantaneous if
> > > > you have enough bandwidth, and
>
>
> just one minor correction here, from the engineering point of view. The
> speed of movement of electrons in a solid (a metal conductor) is MUCH
> LESS than the speed of light in a vacuum (the top speed we see quoted so
> much). This is one of the two major reasons that miniaturisation is so
> essential for high processing speed in modern CPUs. The other of course
> is power dissipation.
>

Interesting observation, and a correct one. Speed of electrons, or EM propagation has a lot to do with the structure of medium and surrounding environment. Difficult to get an accurate value. A simplistic approach may be to solve the Maxwell's equitions to fix the boundary conditions. I'd be doing similar type of research if I didn't have to feed my family. Interesting topic but I doubt it'll contribute much to this forum.

>
> > more snippage
> > > >2) moving disk head to where your data is. This is mechanical motion and
> > > > is tens of thousands of times slower. How far the disk head has to move
> > > > determines how slow your I/O is.
>
> and I'd add a third one. It wasn't very relevant many years ago, but it
> is today, in this day and age of sub-nanosecond CPU cycle speeds. And
> that is the rotational speed. It hasn't increased even by an order of
> magnitude in the last 20 years, while processing speed has by quite a
> few. I'm talking linear rotational speed, not angular speed.
>

High RPM creates more tension in the disk, and makes it harder to stop. This could be why they keep the rotation speed low. But I am not sure.

> It is nowadays a relevant slow down whereas before it wasn't. It used to
> be referred to as "rotational latency". Thrown it in into the equation
> and the argument for splitting tables/indexes becomes even less relevant!
>

Hmm... can't see why low RPM makes it a good idea to have table/index segments on the same disk. Will give it some thought, though.

>
> > more snippage
> > > >So there is more to tuning disk I/O than simply make your instance read
> > > >from many disks at the same time. Depending on your application, you
> > > >may have some control on the speed of an average I/O.
>
>
> Actually, I'd say this a most relevant aspect. You WANT your application
> to be able to read from as many disks as possible at the same time. WHEN
> it has to read from disks.
>
> What I think is important is coming up with a method of achieving this,
> as this implies. Splitting indexes and tables is not the best one, as
> has been amply demonstrated here. This however doesn't mean we all give
> up on tuning our I/O!!!
>
> The best approach IME is a heuristic one: sample, measure, then decide
> which are the "hot spots" for I/O. Then spread those across the devices
> available. There is a little bit more about it than just this. You
> don't want to have a situation where you solve a single hot spot and then
> end up with 50 new hot spots. Ie, the old problem of removing one
> bottleneck and watch 50 others replace it. We have to be a little more
> preemptive than just that. This is where knowledge of the application
> and its patterns of use comes in helpful.
>
>
> > more snippage
> > requests, so location of table/index segments doesn't matter that much. My
> > point is we shouldn't generalize it so much to say there is nothing we can do
> > to control disk I/O regardless of the application.
>
>
> Exactly. Of course we can. The following is just one technique.
>
>
> > more snippage
> > > yup, 100% -- always. In that case, what I would like is for my table to be in
> > > MANY extents and my INDEX to be in MANY extents and these extents are spread
> > > across disk 1 and disk 2 evenly (we do that -- when allocating an extent for a
> > > segment we go from file to file). Now I have an even distribution of data
> > > across the two devices.
> > >
> > > If the index is cached -- i don't get a hot disk (from all reads going to DATA)
> > >
> > > If the index is not cached -- i don't get a hot disk -- the IO is even and since
> > > the reads are *random* reads anyway -- it matters NOT whether they are on 1 or
> > > 1,000 disks.
>
>
> Precisely. Very well explained indeed. Of course there are special
> cases and exceptions. These must be addressed as needed, but in general
> the "divide and conquer" approach for disk distribution is one of the
> best.
>
> Add in a good controller cache and/or SAN cache and you have the start of
> a very smooth and fast I/O distribution. In fact, there are a few
> variations of this distribution that work particularly well when used
> with large h/w caches in the I/O subsystems. Add in the multiple buffer
> caches available since version 8 and the possibilities are tremendous.
>
> We can now partition the I/O cache for logical load as well as physically
> apportion cache over disk arrays, which can themselves be distributed!
> If anyone has an I/O performance problem nowadays, they'll have to work
> hard to sustain it!
>
>
> > more snippage
> > >>Thank you. Exactly what I ahve been looking for. The ribbon has been tied, the
> > >> card signed, and the envelope sealed.
>
>
> Hehehe! Sounds like Xmas gift wrapping, Dan. Don't seal yet, there is
> more to this I/O saga than just this. It is a work in progress thing.
> Let's not create another "myth", OK?
> ¦)

I believe the stripping, I/O balancing, etc. without any consideration of segment location is a statistically good solution for I/O problem for reasons stated in some previous postings. "statistically good" meaning out of 10 applications 9 would benefit from such a configuration. Or the percentage could even be higher and that wouldn't surprise me. But I am not ready to accept that this is "one model fits all".

Getting back to the basics, to improve your I/O performance a few of the things I normally consider are,
*the application needs to use the most efficient access paths *get as many disks as possible to work for you at the same time (there doesn't seem to be any disagreement on these two) *make the average I/O as fast as possible On this one, the responsible approach is to understand what determines the speed of I/O and use your common sense to tune it when you think there is room for improvement. And you need to understand your application to do this.

Of course there is cache, but that's not something I normally handle.

This a good thread. I've seen better discussion here than probably anywhere else. Keep it going! Received on Sun Apr 28 2002 - 23:18:09 CDT