Re: Understanding partitioning by hash

From: Howard J. Rogers <hjr_at_dizwell.com>
Date: Thu, 09 Oct 2003 09:20:33 +1000
Message-ID: <3f849c52$0$22822$afc38c87@news.optusnet.com.au>

Rick Denoire wrote:

> "Howard J. Rogers" <hjr_at_dizwell.com> wrote:
>

>>Everything will still work, but not exactly as before, because you're
>>using a new feature. Yes, there are a number of new things to consider
>>(such as an appropriate number of hash partitions, striking a balance
>>between the benefits of partition elimination (the more partitios, the
>>better) and poor data distribution (the 'right' number is important to
>>determine, and more doesn't necessarily mean better).

>
> So one hast to take a look at data distribution and adjust the number
> of partitions if necessary.

Yup. Make a wild stab in the dark to start with to get the granularity of partition elimination down to something you'd like, and then monitor it. If you get a peaky distribution (which is pretty unlikely, actually, unless you are a really poor chooser of partition numbers in the first place!), you can re-organise into a new table with a different number of partitions (not cheap to do, of course. But the point really is: you're not stuck for ever and a day with whatever whacky data distribution Beelzebub and his partitioning minions decide to visit you with).

> Eh, I like partitions! I even like partitioning by hash more than by
> range, specially when the range has no limit in one direction and
> repartitioning of the overflow partition would be necessary from time
> to time.
>
> Thanks
>
> Rick Denoire

Absolutely. Hash partitioning really makes a lot of sense in many, many situations. You've just described a very good case where range partitioning can be a pain.

Personally, range-hash partitioning is even juicier. Date distribution, and then an even distribution of rows within a particular date distribution. Lovely!

By the way, the issue as to why 'enable row movement' is not the default is simply that it violates every known rule of relational theory! When a row is inserted, it acquires its rowid. That rowid is not supposed to change under any circumstances (that indeed is why we have row migration as a problem in ordinary tables... if a row grows to such a size it can't fit into its block any more, why don't we just physically move it ALL to a new block? Because that would imply a new rowid, and that would be 'naughty'. So we leave the bit of the row behind in the original block, thus preserving its original rowid, and link that row piece to where the full data has actually been moved to. If we didn't care about rowids, we could just remove the row from one block, insert it into another, and have done with it. No linking, no migration, no performance worries).

Why should rowids not change under any circumstances? Because indexes use the rowid to point to a row in a table. Therefore, if you suddenly start mucking around with the rowid, your indexes will be pointing to the wrong location... unless you go to the effort (and expense) of *updating* the index as the row moves. And there's the catch: if you enable row movement for a partition, you enable dynamic updating of the index to keep track of the new rowid for a row. And that means what you hoped would be a quick update of a table row actually turns into a lot of work on indexes.

Of course, Oracle has broken the 'thou shalt never change rowids' commandment elsewhere in 8i: what do you think 'alter table X move' does? But with that command, Oracle foreswore dynamic updating of the indexes, which is why you are required to rebuild your own indexes after a 'move table' command. That makes sense, because if you've just moved every row in the table, you're going to have to update every index entry too... so you might just as well rebuild the thing in any case. But that shows you the nature of the problem.

Incidentally, there's another side effect of moving rows: what do we insert into the redo stream when you do a transaction? Er, the rowid of the row being updated. How useful is a rowid from the logs when its pointing at entirely the wrong place? I'm not suggesting that 'enable row movement' means a segment becomes unrecoverable or anything like that. But if you were to be idly looking at logs from, say, a month ago with Log Miner, and wondering what rows were being updated, you wouldn't be able to join the rowid in v$logmnr_contents with the rowid from your table and get 100% matches. Of course, in 9i you have supplemental logging which can get around even that issue (you drop the primary key, or a unique key, into the redo to mean that even when the rowid produces garbage, you can still locate your row). But supplemental logging introduces a whole new set of issues, principally related to performance, as you threaten to overwhelm LGWR with a vastly-increased workload. And maybe that's an issue best left for another discussion!!

Regards
HJR

-- 
--------------------------------------------
See my brand new website, soon to be full of 
new articles: www.dizwell.com.
Nothing much there yet, but give it time!!
--------------------------------------------

Received on Wed Oct 08 2003 - 18:20:33 CDT