| Oracle FAQ | Your Portal to the Oracle Knowledge Grid | |
Home -> Community -> Usenet -> comp.databases.theory -> Extendible Hashing question on Splitting to Handle Overflow
(The following is for a disk-based extendible hash-table).
I've got a question on extendible hashing, and what exactly occurs when buckets are split to handle overflow.
Say my trie looks like this:
.
/ \
0 1
/ \
A /\
0 1
/ \
B C
Which results in the following directory:
Hash-Code-Lower-Bits Bucket 00 A 01 A 10 B 11 C
Now, according to my trie, any hash-code that ends with a zero goes to bucket A, and any hash-code that ends with 10 to B and 11 to C. My main question is, say I generate some hash-codes for some entires, and I get the following:
Entry HashCode Bucket K1 00 A K2 11 C K3 01 A K4 01 A K5 10 B K6 00 A K7 10 B
Further, say that each bucket can only hold four entries. Given this, K1, K3, K4 and K6 have all mapped to bucket A. Now say I add another entry, K8, and it also maps to A.
K8 00 A
This means that I have a split. I can use the low-order bit of the 0 branch from the root of my tree to make up another bucket, and effectively split A up:
/\
/ \
/ \
0 1
/ \
/\ /\
0 1 0 1
/ \ / \
A D B C
From the original directory, K3 and K4 were both mapped to A. However,
with this new trie they should both map to D. Does this mean that I
have to read the data back from Bucket A on disk, and then rewrite it to
Bucket D? I.e., go thorugh all of the entries for Bucket A on disk
(K1, K3, K4 and K6), regenerate their HashCodes (simple enough), and
then find out where they belong after the split (A or D)? If so,
isn't this a severe performance penalty? If it is possible to hold an
entire bucket in memory, this means at least three seeks: one to read in the
bucket that has been split, one to write out the entries that go to the
old bucket (in this case A) and one to write out the entires that go to the
new bucket (in this case D). And if we can *not* hold an entire bucket
in memory, this means that there are even more seeks.
Or am I looking at the implementation all wrong? I can understand the principles behind the splitting, but I am confused as to how we guarantee an entry goes to the correct bucket after we start splitting things up.
Many thanks,
Patrick
-- patrick_at_tendim.cjb.net http://www.tendim.cjb.net/ Like anime? Got hotline? hotline://twoaday.cjb.net/Received on Mon Dec 17 2001 - 20:05:23 CST
![]() |
![]() |