Re: Separating data, index objects

From: Mark A <nobody_at_nowhere.com>
Date: Wed, 6 Jul 2005 12:07:38 -0600
Message-ID: <xYKdnaaOON50gFHfRVn-vQ@comcast.com>

> "Mark A" <nobody_at_nowhere.com> wrote in message
>
> Hi Mark,
>
> A few of comments if I may.
>
> Firstly, a very important point as there's often confusion about this. "In
> theory", there is *no* performance benefit in separating indexes from
> tables assuming that you have a tablespace defined with a disk sub-system
> that can cope with the necessary I/O load. *None*. If I/O load is an
> issue, then some form of disk I/O load balancing might be required but
> separating indexes from tables in an attempt to achieve this would
> generally be the *wrong* thing to do.
>
> OK, my attempt to explain why.
>
> A traditional index range scan performs the operation *serially* and just
> as importantly, any resultant physical I/O is *random*. So imagine we've
> just read an index block in order to retrieve a collection of rows of
> interest. The I/O required to read this index block is a single, random
> piece of I/O, regardless of whether it's on the same or different physical
> disk as the table. We *next* read the table block corresponding to the
> rowid obtained, which is again random (it could be any block in the table)
> which again requires the *same overhead*, regardless of whether it's on
> the same disk as the previous index block or not. The seek/spin overheads
> are likely to be the same. The next index block we read (if not already
> cached) is again another random read and this index block may not even be
> physically co-located with the previous index block as it physically could
> be located anywhere (especially if lots of index splits have ensured) so
> again, the overhead to read this index block is the same regardless of it
> being on the same or different disk as the table. The next table block we
> now need is (you guessed it) random which means it could possibly be any
> block within the table, located in any extent within the table, anywhere
> within the disk. Again, the overheads are likely to be the same whether
> it's on the same disk or not as the index.
>
> And so on it goes ...
>
> So even in a *single* user system, separating indexes and tables provides
> *no* performance benefit. In theory or not.
>
> And then you add the other 1000+ users in your production system who are
> also trying to access other blocks within the same index/table or other
> blocks within the same tablespace or other blocks within the disk and you
> have a scenario where the separation is totally futile from a performance
> perspective.
>
> However, this myth just refuses to die ...
>
> And I suspect one reason why it still does linger is because one sometimes
> hears lovely stories of how separating indexes from their tables has
> improved performance; so it must be the case that separating indexes
> improves performance, right ?
>
> Well actually right but for the wrong reasons.
>
> What usually happens is that a new disk sub-system is *added* to the
> database and the indexes are moved to these new disks. We now have
> additional disks, meaning the I/O load is spread across more devices and
> so possible disk contention is reduced. Basically, the tablespace now
> containing just the tables no longer has the I/O load associated with the
> indexes. *Disk* contention may well have been reduced.
>
> Great, performance could well have improved.
>
> But what have we actually done here? We have one tablespace that has all
> the tables (which based on the size of the objects and the way they get
> accessed are less likely to be cached) with another tablespace containing
> indexes (which generally being smaller and the manner that the index
> blocks are access via root/branch blocks are more likely to be cache). So
> we end up with a "hotter" tablespace containing tables where physical I/O
> is more prevalent and a "cooler" tablespace containing indexes where
> physical I/O is likely to be less prevalent. The end goal of attempting to
> even the I/O load across all our disks is likely to be unsuccessful once
> the database has warmed up or not as successful as it could have been.
>
> Yes, performance could have improved but we could have done a lot *better
> by keeping our indexes and tables together*. It's the new disks that have
> helped performance, not the separation of index and tables. We could have
> achieved the same (or likely better) result by separating some tables from
> other tables and some indexes from other indexes.
>
> Instead of allocating the new disks to a new index tablespace, allocate
> the new disks to the *existing* tablespace and stripe *both* your indexes
> and tables across the devices. Now we've reduced the contention issues on
> the previous disks and we've made a lot better effort at evening the disk
> I/O load across *all* the devices. Overall performance is *better* with
> index/tables together compared to the same system that has separated them
> out.
>
> So in summary, no, separating indexes from tables doesn't improve
> performance. Adding extra disks might.
>
> Even in theory :)
>
> Cheers
>
> Richard
>

Obviously, separating the indexes and table data, but putting them on the same physical device would serve no purpose (theoretical or practical). I think it was assumed by the OP and myself that if data and indexes were separated then they would be placed on different physical devices.

If that was not clear in my posts, then I apologize.

In my initial responses to the OP, I was willing to grant the theory argument just for the sake of not having to argue it. I clearly said that in practice, separating them makes little or no difference. I don't think it makes much difference even if extra physical devices are used, assuming one does some decent bufferpool configuration.

I like large bufferpools, not as a substitute for SQL tuning (which should always be done and is a separate subject), but as a substitute for synchronous disk I/O. Received on Wed Jul 06 2005 - 13:07:38 CDT