Re: separate data/inidex

From: Mike Ault <mikerault_at_earthlink.net>
Date: 4 May 2002 18:01:20 -0700
Message-ID: <37fab3ab.0205041701.508e934e@posting.google.com>

dyou98_at_aol.com (D.Y.) wrote in message news:<f369a0eb.0204301007.5ac55602_at_posting.google.com>...
> Thomas Kyte <tkyte_at_oracle.com> wrote in message news:<aam0i3091q_at_drn.newsguy.com>...
> > In article <f369a0eb.0204292044.4fad1e36_at_posting.google.com>, dyou98_at_aol.com
> > says...
> > >
> >
> > ...
> >
> > >>
> > >> Bong! Nope, though I plead guilty to having recommended frequent rebuilds as
> > >> recently as last year.
> > >
> > >You need not to.
> > >
> >
> > I find people waste a HUGE amount of time and resources rebuilding indexes. It
> > is good to recant.
> >
> > >>
> > >> First: when was the last time you ever saw an Oracle index with a height
> > >> greater than 3? It's extremely hard to achieve, actually, because Oracle's
> > >> re-balancing mechanism is pretty good. So 'reducing the depth of the B tree'
> > >> is not an issue for 99.5% of all indexes out there.
> > >
> > >Ah, we are getting into another interesting discussion. I say "possibly" not
> > >"always". Your quote of 99.5% sounds a little too high to me but you could
> > >be correct. I heard more than one Senior DBAs say it always takes 3 disk reads
> > >to retrieve the rowid. See, there are quite a few people out there who think
> > >all the indexes have three levels in their B trees!
> > >
> >
> > Sr, smeener. It might take zero (indexes are pretty cachable).
>
> Possible in some systems. My indexes are routinely GBs to dozens of
> GBs in size so I can't always expect that to happen.
>
> >
> > >>
> > >> "Half empty blocks"? I hope you're not implying that empty space in a leaf
> > >> node can never be re-used, because it can. If you delete 'Bob' from the
> > >> table, the index on the name column has a deleted leaf node entry for 'Bob',
> > >> true enough, taking up space and being a waste of space. But if you were to
> > >> insert a record for 'Bruce', that deleted entry is cleared out, and the
> > >> space allocated to Bruce's index entry. Given a reasonably random pattern of
> > >> insert activity, it is extemely unlikely that you'll have half-empty blocks
> > >> sitting there for very long.
> > >>
> > >> Third, a rebuild causes the index to be re-constructed nice and compactly,
> > >> with no spare space, true enough. That might give you a bit of a lift in
> > >> performance when reading that index immediately after the rebuild. But the
> > >> next piece of DML you perform is going to cause the index to start keeling
> > >> over again soon enough. At which point the index has to start re-acquiring
> > >> all the extents it lost as a result of the rebuild, but which it had already
> > >> gone to the trouble of acquiring before the rebuild. Extent acquisition is
> > >> not cheap. In fact, in dictionary managed tablespace, it's bloody expensive!
> > >> Therefore, the index actually has to repeat a lot of extremely expensive
> > >> dictionary-management activity that, had you not rebuilt it, it would not
> > >> have had to do. Performance can therefore actually be worse after a rebuild
> > >> than before. Of course, in locally-managed tablespace, that argument is
> > >> rather less strong than it used to be.
> > >
> > >In gereral, I find index rebuild make performance somewhat better if not much
> > >better.
> >
> > Interesting, how do you measure this? what approach do you use? what metrics
> > do you provide?
> >
> > I've always asked people if they do this, no one ever seems to. How have you
> > quantified this in a measurable way?
>
> In a TB database, when your nightly batch only takes 4 hours now instead of 6
> or 7 hours and the only change you made was rebuilding some huge indexes, people
> naturally think defrag made a different. Your batch process gets slow again in
> a month or so and the cycle starts over ... Certainly wouldn't rule out any
> side effects of index rebuild that could have contributed to performance gain.
> But I didn't give it enough thought.
>
> >
> > > Performance can be worse but those are rare exceptions and I haven't
> > >seen one. If you set your parameters appropriately it normally won't happen.
> > >Almost all the DBAs have rebuilt indexes.
> >
> > almost all dba's separate tables from indexes on difference devices.
> > many dba's think 1 extent is really cool.
>
> Wouldn't bother with making everything fit within 1 extent. But I sure don't
> like to have too many big holes in my index blocks.
>
> > tons of them think the most discriminating field needs to go first in an index
> > and so on....
> >
>
> That has been how I create indexes until compression became available. Not
> that I needed to do it for performance reasons. I just didn't see anything
> wrong with it. I've tried hard to come up with a reason why many people use
> the most selective field as the leading column. The only thing I can think
> of is the search algorithm. If binary search is used to retrieve an index key,
> it may require fewer CPU cycles when you hit the most selective column first.
> If Oracle uses a clever algorithm, it may not make a difference at all. I hope
> someone with the knowledge will explain.
>
> >
> > > If they find it help performance
> > >they keep doing it. Otherwise they stop.
> >
> > I find NEITHER to be true.
> >
> > A) they never measure it
> > B) they *never* stop.
>
> If this is true then they need to stop to think.
>
> >
> >
> > > Don't take the phrase "half empty"
> > >at its face value. I really meant to say fragmentation. As you explained,
> > >index space can be reused. But it's not ALWAYS reused, nor is it NEVER reused
> > >as is the impression one might get from reading the documentation. How
> > >fragmented an index can be depends on the data. If the density of your index
> > >key is more or less uniform you won't have much fragmentation. If your data is
> > >highly skewed both ways, whether you use a regular or reversed key index you
> > >are going to have significant fragmentation over time and rebuilding this type
> > >of indexes will certainly help.
> > >
> > >To find out what's the best thing to do you need to understand the access
> > >pattern and data distribution of your application. That will often point you
> > >in the right direction (assuming you understand the storage and access
> > >mechanism of your database). This approach has helped me much more than simply
> > >following some rules of thumb, or what we call "myths".
> > >
> > >>
> > >> So, there's a balance to be struck. Tom Kyte once told me that he could
> > >> count the number of times he's rebuilt an index in his entire life on the
> > >> fingers (not thumbs) of both hands. That strikes me as a slightly
> > >> too-Puritan approach! But Jonathan Lewis has a line in his book which reads
> > >> (paraphrasing wildly) 'don't strive too officiously to achieve a mythical
> > >> 100% efficiency for your indexes, because they'll never stay 100% efficient
> > >> for long' -meaning, regular rebuilds are a waste of time. So somewhere
> > >> between never and always is an appropriate rebuild frequency, but it's
> > >> definitely on the 'occasional' end of the spectrum.
> > >
> > >That's the point. It's about BALANCE. And it's driven by your application.
> > >
> > >>
> > >> Regards
> > >> HJR
A fascinating thread. Much of what has been said I agree with, although I am a fan of seperation for logical reasons. In the old disk days before RAID was widespread I had the situation where a DBA did a blanket disable and enable of all primary keys using a hastily constructed script. Thus, no tablespace or storage was sepecified and all the PK indexes wound up in with the tables on the same drive. Performance was abysmal. We moved the indexes back to the proper tablespace on a seperate drive and saw an immediate 30% or more improvement in query times and in general overall performance. I can only go with what works, in this situation it made a definate improvement.

RAID blurs everything, the biggest problem with RAID is not going wide enough (not enough IO to satisfy demand) and not having proper stripe width.

As to index rebuilds, I say to rebuild them when:

Level is to deep (unlikely)
Excessive browning ( as defined as comparison of empty leaf nodes against used as determined by looking at index_stats after analyze) Excessive increase in clustering factor

In cases where browning was excessive or clustering factor was excessive an index rebuild improved both and performance using the index improved. Again, I have to go with what works regardless of what experts say when they sit around the virtual wood stove with their virtual pipe smoke cirling around their heads.

Since clustering factor is the major base for index cost, a high clustering factor will eliminate a good index from consideration in the CBO, therefore reduction of the CF is desirable in most cases, either by rebuild in single column index or rearrangement of the columns in concatenated index. I have seen clustering factor drop from over 20,000 to around 490 by reordering the columns in a concatenated index and drop by an order of magnitude in a bitmap index that had been in use in an IUD environment by rebuilding it.

So I am afraid until I see statistics that always apply I will continue to suggest putting contending tables and indexes in seperate array areas and monitoring and performing intelligent index rebuilds.

As to multiple extents, yes, in SELECT environments multiple extents are not important. However in IUD environments the required overhead of free list searching and multiple extents can cause a significant performance drag. In one test with a table with a large number of extents (>1000) an update involving a large percentage of rows required 14 hours. A rebuild of the indexes used by the update process reduced it to 12. An export and import with compress equal to NO reduced this to 10 or so, an export and import with compress=y reduced this to 6. Over a 200% improvement by just eliminating multiple extents. So I am afraid I still suggest reducing extents to what makes sense depending on transaction types and access types. How 9i with its bitmap extent management will effect this waits to be seen. LMT or DMT didn't seem to make much difference as it wasn't extent management that caused the problem is was the large number of extents themselves.

I also suggest on systmes that allow it, but especially NT, disks be defragmented on a frequent basis for heavily updated disks. However, RAID should not require this.

I am afraid I have to go with the evidence of my own experiences, eliminating contention, optimizing index (and table) storage structure, minimizing extents all have their place in the DBA toolkit, even in this modern chaotic access environment we all operate in.

Mike Ault
Oracle9i Administration and Management
Oracle8i Administration and Management
Oracle8 Administration and Management
Oracle7 Administraiton and Management
OCP ExamCram (Several)
Unix Administrators Companion
many successful tuning engagments on small, large and very large systems Received on Sat May 04 2002 - 20:01:20 CDT