OraFAQ Forum: Performance Tuning » Hash cluster with small (tiny) rows ?

Home » RDBMS Server » Performance Tuning » Hash cluster with small (tiny) rows ? (11g)

Show: Today's Messages :: Polls :: Message Navigator
E-mail to friend

Hash cluster with small (tiny) rows ? [message #382952]

Mon, 26 January 2009 11:34

chris3110
Messages: 3
Registered: January 2009

Junior Member

Hi there,

I have a large table (25M customers, indexed by phone number) that I need to query by PK a lot of times during some parallel computations. I thought a nice way to do that would be to use a hash cluster storage, however to populate the table from the raw data would take a long time because hash clusters cannot be populated in parallel. Currently the table is populated through parallel direct load sqlldr and it is very fast.

Also since basically I only need to retrieve one integer ID for each client for the computations, I thought I could possibly create the cluster table as a CREATE AS SELECT TABLE, extracting only the minimum amount of data necessary, i.e., the PK and the ID. In fact the ID requires a small join to be retrieved, so this would have the additional advantage of precomputing this join.

Now the problem is Oracle refuses to create a hash cluster for such tiny rows because it leads to more clusters in an 8k data block than the maximum permissible (256). So now I'm stuck with this issue: I would definitely like to use a hash cluster table but without transfering the whole table data to the cluster, only a single integer per row. Is there a solution to this problem ? What I'm really trying to do is to store a hash map into a table.

Any advice here ?

Thanks for your help
Chris

Report message to a moderator

Re: Hash cluster with small (tiny) rows ? [message #382954 is a reply to message #382952]

Mon, 26 January 2009 12:37

BlackSwan
Messages: 26766
Registered: January 2009
Location: SoCal

Senior Member

http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:4536286321836

You need to help us by following the Posting Guidelines as stated below.
http://www.orafaq.com/forum/t/88153/0/
So we can help you & please be consistent & correct in your postings.

Report message to a moderator

Re: Hash cluster with small (tiny) rows ? [message #383157 is a reply to message #382954]

Tue, 27 January 2009 07:38

chris3110
Messages: 3
Registered: January 2009

Junior Member

Hi Swan,

thanks for the hint. In fact I saw the long AskTom thread but didn't find an answer to my question there (although it's difficult to be sure, the thread is really long). So in order to be a bit more clear, this is what I did up to now.

First the DB version: Oracle Database 11g Enterprise Edition Release 11.1.0.7.0 - 64bit Production

I created a sample standard table to get an idea of the average row length (result: 15 bytes)

CREATE TABLE test1 AS
    SELECT CAST(substr(cst.msisdn, 4) AS char(8)) suffix, cst.id profile
      FROM customers cst
     WHERE cst.msisdn LIKE '123________'
       AND ROWNUM < 10000;

Then I tried to create the hash cluster as per the documentation but got the error message I mentioned:

CREATE CLUSTER test2 (suffix char(8))
SIZE 16
SINGLE TABLE
HASHKEYS 26000000
TABLESPACE TICKETS;

CREATE CLUSTER test2 (suffix char(8))
*
ERROR at line 1:
ORA-02454: Number of hash keys per block (505) exceeds maximum of 256

But I think I may have found the solution to this problem: simply use larger cluster size, which means that I won't have a one-to-one record/hashkey relationship, but is it a problem ?

CREATE CLUSTER profile_cluster (suffix char(8))
SIZE 32
SINGLE TABLE
HASHKEYS 13000000
TABLESPACE TICKETS
PARALLEL 6;

Now my table is loading, it takes quite some time because there is little parallelism here:

  INSERT INTO cust_profile_map
    SELECT /*+ FULL(cst) PARALLEL(cst, 6) */
           CAST(substr(cst.msisdn, 4) AS char(8)), cst.id
      FROM customers cst
     WHERE cst.msisdn LIKE '123________';

By the way, if I understand it correctly Oracle will apply on a given key a hash function which result is statistically uniformly distributed over the result space, i.e., in this case hopefully 2 hits per cluster. But this means that statistically there will be cases where more than 2 rows will contend for the same cluster. Then I guess a collision occurs and a new cluster needs to be allocated to the key, which means 2 I/O instead of one, reduced performance, etc. Is there a way to know about how many collisions have occurred during the table population ? Should I take some measure to try to avoid collisions ? The documentation seems to advise nothing particular when the number of clusters per block is large.

Thanks for your help,
Chris

Report message to a moderator

Re: Hash cluster with small (tiny) rows ? [message #383238 is a reply to message #383157]

Tue, 27 January 2009 20:35

rleishman
Messages: 3728
Registered: October 2005
Location: Melbourne, Australia

Senior Member

I have done much the same thing in the past.

I started off with a 1:1 ratio of values to hashkeys, but since you don't get a perfect distribution, you get a lot of collisions. Since the blocks are full, the colliding row(s) get chained into other blocks.

Since Oracle cannot read any LESS than one block, there is (theoretically) no harm in creating fewer hash keys - so long as collisions do not result in chaining.

At 16 Bytes per key, you can fit more than 500 into an 8KB block. Lets say you leave a little room for growth and for collisions and plan only to put 450 keys in each block. With 26M rows, you will need 57778 (lets call it 58000) blocks. So set your SIZE to 8K and your HASHKEYS to 58000.

Ross Leishman

Report message to a moderator

Re: Hash cluster with small (tiny) rows ? [message #383330 is a reply to message #383238]

Wed, 28 January 2009 05:49

chris3110
Messages: 3
Registered: January 2009

Junior Member

Hi Ross,

thanks, your suggestion of using the largest possible cluster size in order to even out the non-uniformity of the keys distribution makes sense. However I have a concern regarding the best cluster size: it seems to me that since the data blocks are not wholly usable due to header overhead, 8k might not be the best possible size, or am I mislead here ?

Since in my case the overhead is about 140 bytes, I might as well use the next possible value for cluster size, i.e., 256 bytes (since the table will be used solely for reading, I intend to set PCTFREE to 0). So I would end up with 1625000 clusters of 256 bytes each, which allows to store about 26M rows, leaving some space for possible collisions. Does this make sense to you ?

Besides is there a way to get stats about the frequency of collisions in the cluster after loading ?

Thanks a lot for you help,
Chris

Report message to a moderator

Re: Hash cluster with small (tiny) rows ? [message #383434 is a reply to message #383330]

Wed, 28 January 2009 20:43

rleishman
Messages: 3728
Registered: October 2005
Location: Melbourne, Australia

Senior Member

I use the following SQL to measure collisions in a single table hash cluster

ACCEPT cluster_name PROMPT 'Cluster Name: '

COLUMN table_name NEW_VALUE table_name

SELECT  table_name
FROM    user_tables
WHERE   cluster_name = upper('&cluster_name')
/

COLUMN db_block_size NEW_VALUE db_block_size

SELECT  value AS db_block_size
FROM    v$parameter
WHERE   name = 'db_block_size'
/

COLUMN nblocks NEW_VALUE nblocks
SELECT  round((&db_block_size / key_size) * hashkeys) AS nblocks
FROM    user_clusters
WHERE   cluster_name = upper('&cluster_name')
/

SELECT  count(*)        AS overflow_blocks
,       sum(row_count)  AS overflow_rows
FROM (
        SELECT  file#
        ,       block#
        ,       row_count
        ,       row_number() OVER (ORDER BY file#, block#) AS block_seq
        FROM (
                SELECT  dbms_rowid.rowid_relative_fno(ROWID) AS file#
                ,       dbms_rowid.rowid_block_number(ROWID) AS block#
                ,       count(*) row_count
                FROM    &table_name
                GROUP BY
                        dbms_rowid.rowid_relative_fno(ROWID)
                ,       dbms_rowid.rowid_block_number(ROWID)
        )
)
WHERE   block_seq > &nblocks
/

Your allowance for a header makes sense. I have never made such an allowance, yet I have still managed to get the utilisation I expect using a cluster size of exactly 1 block. I don't have a good explanation for this, although I allow for the possibility that I have been mistaken. You could experiment yourself and see what you see.

Ross Leishman

Report message to a moderator

Previous Topic:	performance hurt after dropping the parititons
Next Topic:	working with tkprof utility

Goto Forum:

-=] Back to Top [=-

[ Syndicate this forum (XML) ] [

]

Current Time: Fri Apr 19 19:33:09 CDT 2024