Re: Separating data, index objects

From: Richard Foote <richard.foote_at_bigpond.nospam.com>
Date: Wed, 06 Jul 2005 12:46:30 GMT
Message-ID: <GoQye.16437$oJ.4901@news-server.bigpond.net.au>

"Mark A" <nobody_at_nowhere.com> wrote in message news:epadnVAeuImRXlTfRVn-rQ_at_comcast.com...
>
> In theory, if the table data and index data are in the same tablespace,
> the disk contention could occur if there was a significant amount of I/O
> against the table and index at the same time.
>
> In practice, with good design, a fair amount of the data will be in the
> bufferpool and not require synchronous disk I/O. Obviously, this depends
> on the size of the tables and indexes relative to the size of the
> bufferpool.
>
> Of course, if the index and table are in separated, but still on the same
> physical device, then that will not do much good.

Hi Mark,

A few of comments if I may.

Firstly, a very important point as there's often confusion about this. "In theory", there is *no* performance benefit in separating indexes from tables assuming that you have a tablespace defined with a disk sub-system that can cope with the necessary I/O load. *None*. If I/O load is an issue, then some form of disk I/O load balancing might be required but separating indexes from tables in an attempt to achieve this would generally be the *wrong* thing to do.

OK, my attempt to explain why.

A traditional index range scan performs the operation *serially* and just as importantly, any resultant physical I/O is *random*. So imagine we've just read an index block in order to retrieve a collection of rows of interest. The I/O required to read this index block is a single, random piece of I/O, regardless of whether it's on the same or different physical disk as the table. We *next* read the table block corresponding to the rowid obtained, which is again random (it could be any block in the table) which again requires the *same overhead*, regardless of whether it's on the same disk as the previous index block or not. The seek/spin overheads are likely to be the same. The next index block we read (if not already cached) is again another random read and this index block may not even be physically co-located with the previous index block as it physically could be located anywhere (especially if lots of index splits have ensured) so again, the overhead to read this index block is the same regardless of it being on the same or different disk as the table. The next table block we now need is (you guessed it) random which means it could possibly be any block within the table, located in any extent within the table, anywhere within the disk. Again, the overheads are likely to be the same whether it's on the same disk or not as the index.

And so on it goes ...

So even in a *single* user system, separating indexes and tables provides *no* performance benefit. In theory or not.

And then you add the other 1000+ users in your production system who are also trying to access other blocks within the same index/table or other blocks within the same tablespace or other blocks within the disk and you have a scenario where the separation is totally futile from a performance perspective.

However, this myth just refuses to die ...

And I suspect one reason why it still does linger is because one sometimes hears lovely stories of how separating indexes from their tables has improved performance; so it must be the case that separating indexes improves performance, right ?

Well actually right but for the wrong reasons.

What usually happens is that a new disk sub-system is *added* to the database and the indexes are moved to these new disks. We now have additional disks, meaning the I/O load is spread across more devices and so possible disk contention is reduced. Basically, the tablespace now containing just the tables no longer has the I/O load associated with the indexes. *Disk* contention may well have been reduced.

Great, performance could well have improved.

But what have we actually done here? We have one tablespace that has all the tables (which based on the size of the objects and the way they get accessed are less likely to be cached) with another tablespace containing indexes (which generally being smaller and the manner that the index blocks are access via root/branch blocks are more likely to be cache). So we end up with a "hotter" tablespace containing tables where physical I/O is more prevalent and a "cooler" tablespace containing indexes where physical I/O is likely to be less prevalent. The end goal of attempting to even the I/O load across all our disks is likely to be unsuccessful once the database has warmed up or not as successful as it could have been.

Yes, performance could have improved but we could have done a lot *better by keeping our indexes and tables together*. It's the new disks that have helped performance, not the separation of index and tables. We could have achieved the same (or likely better) result by separating some tables from other tables and some indexes from other indexes.

Instead of allocating the new disks to a new index tablespace, allocate the new disks to the *existing* tablespace and stripe *both* your indexes and tables across the devices. Now we've reduced the contention issues on the previous disks and we've made a lot better effort at evening the disk I/O load across *all* the devices. Overall performance is *better* with index/tables together compared to the same system that has separated them out.

So in summary, no, separating indexes from tables doesn't improve performance. Adding extra disks might.

Even in theory :)

Cheers

Richard Received on Wed Jul 06 2005 - 07:46:30 CDT