Re: Extendible Hashing question on Splitting to Handle Overflow

From: Kendall <kendallwillets_at_yahoooo.com>
Date: Wed, 19 Dec 2001 10:21:23 -0800
Message-ID: <pan.2001.12.19.10.21.15.70.26445_at_yahoooo.com>


In article <9vm893$2j1r$1_at_news.tht.net>, "patrick" <tendim_at_vex.net> wrote:

>
> From the original directory, K3 and K4 were both mapped to A. However,
> with this new trie they should both map to D. Does this mean that I
> have to read the data back from Bucket A on disk, and then rewrite it to
> Bucket D? I.e., go thorugh all of the entries for Bucket A on disk (K1,
> K3, K4 and K6), regenerate their HashCodes (simple enough), and then
> find out where they belong after the split (A or D)? If so, isn't this
> a severe performance penalty? If it is possible to hold an entire
> bucket in memory, this means at least three seeks: one to read in the
> bucket that has been split, one to write out the entries that go to the
> old bucket (in this case A) and one to write out the entires that go to
> the new bucket (in this case D). And if we can *not* hold an entire
> bucket in memory, this means that there are even more seeks.
>
> Or am I looking at the implementation all wrong? I can understand the
> principles behind the splitting, but I am confused as to how we
> guarantee an entry goes to the correct bucket after we start splitting
> things up.
>
> Many thanks,
> Patrick
>

It's true that the bucket has to be split somehow, but I've seen a few references to adding a level of indirection, i.e. moving pointers instead of data. If you make your hash entries of the form (hash_value, pointer) you add a step to the data retrieval, but you reduce the size of each element. And, if you store the full hash value, you don't have to re-read the data each time you reorganize, and lookups can check all the bits for a match before a data block has to be read.

Since hash buckets are fairly random groups, I don't see much benefit in clustering the data itself. Received on Wed Dec 19 2001 - 19:21:23 CET

Original text of this message