Re: separate data/inidex

From: Nuno Souto <nsouto_at_optushome.com.au.nospam>
Date: Mon, 29 Apr 2002 20:36:28 +1000
Message-ID: <3ccd2325$0$15476$afc38c87@news.optusnet.com.au>

In article <f369a0eb.0204282018.181b2c47_at_posting.google.com>, you said (and I quote):
> Interesting observation, and a correct one. Speed of electrons, or EM
> propagation has a lot to do with the structure of medium and surrounding
> environment. Difficult to get an accurate value. A simplistic approach
> may be to solve the Maxwell's equitions to fix the boundary conditions.
> I'd be doing similar type of research if I didn't have to feed my family.
> Interesting topic but I doubt it'll contribute much to this forum.

Speed of light was used as an argument. It is far from that in a conductor. Contrary to what you might think, it has been researched many years ago. The average speed in a single copper conductor is less than an order of magnitude higher than the speed of sound. In aluminium (the conductor most used in ICs) it's even less. Surprised? Ask an electronics engineer, they'll know about this: it's a RPITA! One of the reasons the Crays were built like a doughnut.

> >
>
> High RPM creates more tension in the disk, and makes it harder to stop. This
> could be why they keep the rotation speed low. But I am not sure.

Actually, given that an old 3340-class disk spun at around 3500rpm and had a platter around 15" wide, it's probably true that a modern 3.5" disk spinning at 10000rpm will have a linear rotational speed smaller than the old disks did in their outer cylinders! Haven't done the maths, but my intuitive guess is that it wouldn't be much faster. Ultimately, this linear speed defines how fast you can transfer bytes to/from disk. Assuming nothing else limits this transfer.

>
> Hmm... can't see why low RPM makes it a good idea to have table/index segments
> on the same disk. Will give it some thought, though.

It doesn't. It just makes the whole argument for table/index split irrelevant. There are just too many variables disturbing what was a nice theory.
:-)

> even be higher and that wouldn't surprise me. But I am not ready to accept
> that this is "one model fits all".

Me neither! Once again, the bottom line for me: heuristics.

> *the application needs to use the most efficient access paths

Logical as well as physical. Back in the old days of hierarchical databases, there were three distinct disciplines of DB Design: Schema Design (where we mapped a data model to real record types), Logical Design (where we looked at how many logical I/Os were needed to access any given piece of data - also called the logical access path) and Physical Design (where we looked at how many physical I/Os we could squeeze out of the h/w to satisfy the logical I/O load).

Complex? Yes. Efficient? You bet! Nowadays, databases have too many tables for this to be seriously considered. But if you analyze an application and find that most of the daily activity is in half a dozen tables, then it becomes seriously relevant again!

> *get as many disks as possible to work for you at the same time
> (there doesn't seem to be any disagreement on these two)

Exactly.

> *make the average I/O as fast as possible
> On this one, the responsible approach is to understand what determines the
> speed of I/O and use your common sense to tune it when you think there is room
> for improvement. And you need to understand your application to do this.

Exactly. First, we have to define what is an "average I/O" and if we care about it! Like: there may be an average physical I/O, governed by devices, device drivers, file systems, OS, DB. At what point do we define the "I/O" to average it? Hard to say, really. As you said: it's a whole that is more than the sum of the parts. We have to look at the application and examine what is the total I/O pattern. At various times. Then and only then can we determine what is the "I/O" that we need to address, if any.

I'm reminded of a Uni professor here in Australia, who at the height of the Y2K "problem" asked the simple question: "Who has actually analyzed their systems to make sure there really IS a problem? And how?" Instantly crucified by all the "merchants of doom", but crikey: he WAS right!

>
> Of course there is cache, but that's not something I normally handle.
>

Well, cache is useful. When you have tuned everything out to the nth degree, you can use the cache to give you that last little bit of "smooth" I/O load. And applied to certain file systems, it can achieve super I/O speeds. However, relying on it solely to solve general I/O problems is a recipe for more problems. Regardless of what our friends from EMC and others might say. Used correctly, it can help achieve tremendous I/O rates. Used incorrectly, it's just another expensive mistake.

> This a good thread. I've seen better discussion here than probably anywhere
> else. Keep it going!

Amen.

-- 
Cheers
Nuno Souto
nsouto_at_optushome.com.au.nospam

Received on Mon Apr 29 2002 - 05:36:28 CDT