Re: Why "Separating Data and Indexes improves performance" is a myth?

From: Tim Gorman <tim_at_sagelogix.com>
Date: Fri, 23 Apr 2004 11:15:21 -0600
Message-ID: <BCAEAB49.13B95%tim@sagelogix.com>

Juan Carlos,

Apologies in advance for the lengthy explanation, but I've not yet found a way to explain this more concisely...

By far, the most common types of indexed access is UNIQUE and RANGE SCAN. The other two methods, FULL SCAN and FAST FULL SCAN, are not often used during queries, but are most often used during ALTER INDEX ... REBUILD or when index usage is explicitly hinted but not really appropriate. I've recently heard of something called a SKIP SCAN, but as I understand it, it is a variation on RANGE SCAN, so (until corrected) I don't differentiate.

UNIQUE SCAN, RANGE SCAN, and FULL SCAN all utilize random-access, single-block reads (a.k.a. "db file sequential reads" wait-event) while FAST FULL SCAN utilizes sequential-access, multi-block reads (a.k.a. "db file scattered read" wait-event). Thus, the overwhelming majority of indexed access is performed using random-access, single-block reads, which as the wait-event name suggests, is processed sequentially with no opportunity for parallelism or asynchronicity.

So, then the question becomes: why is indexed I/O a series of sequentially-executed, random-access, single-block reads? Why is there no opportunity for executing these indexed scans in parallel within a single database session?

Because performing a UNIQUE SCAN, RANGE SCAN, or FULL SCAN on a B*Tree index structure is very much like the "treasure hunt" games you may have played as a child (or as an adult on a work-related "team-building exercise"). In a treasure hunt, you find a clue, read it to find where to go next, and then go there. When you get there, you find another clue, read it to find where to go next, and then go there. And so on until you reach the prize...

Navigating from root node through branch nodes to leaf nodes and then to table blocks using a B*Tree index is very similar to that. You have to start at the root node, read it to know where to go next, and then go there. When you get there, you read the node, and then move on. And so on...

So, for a single database session, hopping from index block to table blocks is not something that can be "accomplished in parallel", nor is the type of I/O different for indexes or tables -- it is all single-block I/O.

Thus, there is no benefit to performance when tables and indexes are separated into different tablespaces or (by extension) onto different volumes or devices.

However, there are other reasons to separate tables and indexes into different tablespaces, namely recoverability:

Tables contain data, indexes do not contain data (only pointers), so tables should be more carefully protected than indexes. Lose a table, and you've lost data. Lose an index, and you can just rebuild it.

So, the main reason to separate tables from indexes into different tablespaces is data protection and data recovery.

To bear that out, my very first experience as a DBA involved a system which was growing at a high rate but the company was not buying more storage for it -- it was being decommissioned in favor of a replacement system. So, to accommodate growth, we unmirrored volumes underneath index tablespaces and used those mirrors to create new mirrored volumes for table tablespaces and new unmirrored volumes for index tablespaces. This left index tablespaces less protected than the table tablespaces. Yes, we lost some indexes that way, but it enabled us to continue to grow without acquiring any net new storage and also avoid most database recoveries (just index rebuilds) until the replacement system came online. Of course, I eventually got nailed by a series of recoveries anyway, but that's another story... :-(

Counterpoint #1: It is true that tables can also be accessed using FULL table scans, which utilize sequential-access, multi-block reads (a.k.a "db file scattered reads" wait-event). But this fact is still not a reason to separate tables from indexes. Think it through: if you cared enough about optimizing the I/O of FULL table scans to optimize a specially-configured set of I/O devices just for that type of I/O, then chances are good that *any* random single-block I/O from indexed activity will mess things up. Thus indexes would, in that scenario, be altogether undesirable and would probably not be used at all. So again, performance is not a reason to separate tables from indexes.

Counterpoint #2: This explanation of indexed activity does indeed explain why parallel read access is not possible within a single database session, but it does not take into account any opportunities for performance improvement due to numerous concurrent sessions. My response: yeah, and so what is the point being made? The fact that dozens, hundreds, or thousands of sessions may be performing indexed I/O's still does not indicate that there is a performance benefit to separating tables and indexes to different tablespaces or different devices. If this were the case, an equally valid case could be made for separating the different "branch levels" (a.k.a. BLEVELs) within an index to different tablespaces or devices as well...

Hope this helps...

-Tim

on 4/23/04 10:13 AM, Juan Cachito Reyes Pacheco at jreyes_at_dazasoftware.com wrote:

> Hi
> Does any one kwnows please,
> Why "Separating Data and Indexes improves performance" is a myth?
>
> If they are both accesed at the same time, it will improve performance or
> not,
> I don't understand exactly what is this myth about?
>
> http://www.niall.litchfield.dial.pipex.com/OracleMyths.zip
>
> Thanks
>
>
> Juan Carlos Reyes Pacheco
> OCP
> Database 9.2 Standard Edition

Please see the official ORACLE-L FAQ: http://www.orafaq.com

To unsubscribe send email to: oracle-l-request_at_freelists.org put 'unsubscribe' in the subject line.

--
Archives are at http://www.freelists.org/archives/oracle-l/
FAQ is at http://www.freelists.org/help/fom-serve/cache/1.html
-----------------------------------------------------------------

Received on Fri Apr 23 2004 - 12:12:49 CDT