Re: How to "Defragment" Tablespaces ?

From: Brian Peasland <peasland_at_usgs.gov>
Date: Tue, 10 Apr 2001 13:30:24 GMT
Message-ID: <3AD30AF0.2B9FBCFD@usgs.gov>

> brian,
>
> in what way do you think indexes to be different from data tables?

In the end, indexes and tables are nothing more than blocks residing on disk. These blocks contain data. In tables, it is business data and in indexes, it is basically keys and pointers to the next block in the index or to the table rows itself.

The difference lies in how they are used. For the sake of this discussion, I'm going to assume that the indexes are the "standard" B-tree index data structure.

Using an index starts off with reading the root block, making a decision on which way to go, and then get then next branch block. This repeats so on until a leaf block is found. At this point, if the requested item is in the leaf block, then the rowid will be found and point to the table row. If the requested item is not in the leaf block, then the data is not in the table.

This B-tree structure is "flattened" out when it is stored on disk. So one will almost never be able to get a sequential read of disk blocks which will contain the information that they seek. Due to how a B-tree structure is physically stored on disk, a large majority of the time, the disk head will have to bounce around to read the index from top to bottom.

Table access *can* be different, but not always. If one is accessing the table using an index, then there is a very strong possibility that reading data from the table will cause the disk head to bounce around to read the data. If one is doing a full table scan, then Oracle can make use of the DB_FILE_MULTIBLOCK_READ_COUNT parameter and read multiple blocks in one sequential shot. The disk head doesn't bounce around and waste seek time trying to find the next block. In this case, reading a table is different than reading an index. So it makes a difference in how the table is read. This is why full table scans can sometimes outperform using indexes.

With today's hardware, most of these points may be moot. RAID and disk caching, etc. can make life much easier and go a long way towards relieving some of these issues for certain RDBMS configurations.

> oh yeah...do the number of controllers matter? like if there were 10 disks
> on one controller would throughput still increase over say one or two disks
> on one controller?

Of course the number of controllers matter. If I put 100 disks through 1 controller, then I'm going to experience a bottleneck which will be allieviated if I put those same 100 disks on 100 controllers! And there is the Law of Diminishing Returns (LoDR) at work here too. Having 10 disks through one controller may not perform better than two disks through one controller. The controller could be a bottleneck to your performance. But I would also argue that putting everything on one disk through one controller could be worse! Normally (but not always), disk head seek time is the biggest killer to I/O. So having only one disk head do all the seeking (and subsequent reading) would hurt you more than one controller would. But of course, if you have 100 disk heads reading, then 50 of them could be flooding the controller while the other 50 are seeking! The LoDR at work! So one has to find a balance for the number of disks per controller. And my balance is normally dictated by the budget. lol.

In reading this, you might have noticed that there are some points which were not made with conviction. I purposely used words like "may" and "can" and "not always". This is because there are no hard and fast rules here, just general guidelines. Your mileage may vary. I could go into one site and set up 10 disks through 1 controller and never experience a bottleneck in the controller. I could go into another site and hook up the same configuration and see a constant bottleneck. How it plays out in the end depends on too many different factors, i.e. database design, hardware configuration, application development, and end user usage patterns to name a few.

Thanks for the conversation!
Brian

-- 
========================================
Brian Peasland
Raytheons Systems at
  USGS EROS Data Center
These opinions are my own and do not
necessarily reflect the opinions of my 
company!
========================================

Received on Tue Apr 10 2001 - 08:30:24 CDT