Re: single table hash clusters in 9.2 - anyone tried them out yet?

From: Jonathan Lewis <jonathan_at_jlcomp.demon.co.uk>
Date: Mon, 15 Mar 2004 08:37:19 +0000 (UTC)
Message-ID: <c33pvu$iju$1@sparta.btinternet.com>
Notes in-line
But I think Paul pretty much answered your questions with his summary of Steve's point.
-- 
Regards

Jonathan Lewis
http://www.jlcomp.demon.co.uk

The Co-operative Oracle Users' FAQ
http://www.jlcomp.demon.co.uk/faq/ind_faq.html

April 2004 Iceland
June  2004      UK - Optimising Oracle Seminar


"Mark" <simmons_mark_at_yahoo.com> wrote in message
news:5366fb41.0403142157.79634a31_at_posting.google.com...


>

> It has happened to me on a number of occasions that a single table

> hash cluster appears faster when you trace a few single table hash

> lookups vs. a PK.  However, when you run a test job that accesses data

> all over the place the index wins on occasion.  It could be that I've

> ran into exactly what you are talking about on a subset of the

> records, and the cost for this is very high.

>


    Possible - it's one of the many things I still have to
    check. (I'm assumimg you checked that Oracle hadn't
    switched from a nested loop with hash access to
    a simple hash or merge join.)  Another possibility is that
    with a large nested loop by index, Oracle implements
    a pinning strategy on the index that minimises the benefit
    of the hash join.  (With 9.2 there is also a 'table prefetch'
    algorithm that rearranges the order of work for an indexed
    nested loop - I've never seen it physically occurring, though,
    but maybe it does sometimes, and maybe it does apply to
    hash accesses).



> One of the things, I've been thinking to test lately is how bad

> performance goes down the tubes once you insert more rows than the

> "hash keys" value.  From what I remember off the top of my head, it's

> not really clear in the Oracle documentation as to what happens at

> that point.  I know it can't be good, but still I'm curious how fast

> the onset of a serious performance problem occurs if you let it boil

> over the initial setting.  (Maybe it just breaks, I can't remember.)

>

    It doesn't break, Oracle tries to put the row into the correct
    place, and then has to mess about adding extra blocks to the
    cluster and building overflow pointers from the base block to
    the relevant overflow block.  The access is then very much
    slower.



> The other thing that is not that clear is how uniform the distribution

> of the data needs to be in order for the hashing function to work

> correctly.  For example, if I have large breaks in the value of one of

> the hash keys, does that screw up the algorithm?

>

    It's a very good bet that it will.



> What I'm saying is do the hash columns have to be like

>

> "1 2 3 5 6 7 8 10 11 12..."

>   or will

> "1 3 9 15 28 37 82 91..."

>

> work equally as well.  I've always assumed the data has to be like the

> first scenario in order to work well.  If you know for certain, I'd be

> interested to know the answer.

>


    Your assumption is correct - consider HASHKEYS = 100,
    with SIZE = 800, so that Oracle works out 10 keys per block
    and gives you a hash table of 10 blocks; using "HASH IS ..."

    Take data set:
    1, 2, 3, ...., 100

        Row with hashkey = 67 will be in the seventh block, row entry 7
        (seventh rather than sixth because we count from block 1, not zero).

    Take data set
    2, 4, 6, ... 200
        Row with hashkey 2 will be the second row in the first block.
        Row with hashkey 102 also wants to be the second row in
        the first block - so the special optimisation breaks; and when it
        breaks, it's broken for the block, not just for the hashkey.

    If you have large gaps, but give Oracle the correct number of entries,
    then you are likely to get collisions as above.  On the other hand, if
    you have large gaps, but tell Oracle that the number of rows is the
    same as the highest value you expect to use, then you get no collisions,
    but your hash cluster is much bigger than you would like it to be.



> On a side note...

>

> If I remember right, there's a white paper on hotsos.com that graphs

> out the scaling function for each of the different access methods.


    I don't think I've seen that one.



> They've got some other cool stuff over there too.  I especially like

> the paper they have on tracing the 10053 event.

>


    If you liked that, then you'll like this:
        www.centrexcc.com
    It's the website for Wolfgang Breitling, who wrote the 10053
    paper - there are four or five more papers there that work
    around the same topic.
Received on Mon Mar 15 2004 - 02:37:19 CST