IOT experts - considering switching to IOT?

From: <p_msantos_at_yahoo.com>
Date: 7 Mar 2006 10:57:52 -0800
Message-ID: <1141757872.025139.291110@v46g2000cwv.googlegroups.com>

Dear all,
I'm currently testing the effectiveness to switching some heap tables to IOT's.
The primary goal is simple: REDUCE IO.

My environment is 10.1.0.4 on Solaris.
So my testing consists of the following objects .. This heap table exists 1 per customer in our system.

OBJECT

LINK_225355672         - heap table with following columns:


      - PRODUCT_ID      - always point of entry into this table... get

30% compression on index.


      - AFFILIATE_ID      - 30% of the time used as filter in

predicate... also repetative.


      - USER_PROF_ID  - unique by product_id .. generate from global

      - CONFIM              - always used in predicate with product_id

..value of 1 or 0. 99% values are 0.


      - DATE_CONF        - this date is used in predicate 30% of the

time.


      - DATE_SUB           - never part of the query.

      - RCODE                - never part of the query .. 99% it's NULL

(varchar2(1))

LINK_225355672_PUC_U - Unique index compressed on (PRODUCT_ID,USER_PROF_ID,CONFIM)
LINK_225355672_UP_U - Unique index not compress on (USER_PROF_ID,PRODUCT_ID) Based on the above heap table I've created 2 IOTS. LINK_225355672_IOT - IOT based on the above table, which includes

PRODUCT_ID,AFFILIATE_ID,USER_PROF_ID,CONFIM,DATE_CONF and

                                       it compresses on PRODUCT_ID and

AFFILIATE_ID.
OVERFLOW consists of DATE_SUB,RCODE only.

Created a 2nd IOT based on the original heap table, but the DATE_CONF is not part of the index
but put OVERFLOW consists of DATE_CONF,DATE_SUB,RCODE

Also created secondary indexes on both IOT's based on USER_PROF_ID/PRODUCT_ID.

THE TEST QUERY

The test query is the same, but when I remove the comments the query returns 45K rows, but
when I leave the comment in, it returns 3 million records... this is to give me both extremes of the types of queries in our system...Although most of the queries are in the 300K record range.

select L.PRODUCT_ID,L.AFFILIATE_ID,L.USER_PROF_ID from LINK_225355672_IOT2 L
where

      L.CONFIM = 0
      and --(
           MOD( L.USER_PROF_ID, 56 ) >= 2
          --AND
          --L.AFFILIATE_ID = 348179489)
      and L.PRODUCT_ID in (349431167, 225379922) order by 3

Example of the data set (This is pulled from one of the IOTs).

PRODUCT_ID|AFFILIATE_ID|USER_PROF_ID|    CONFIM|DATE_CONF     |DATE_SUB
     |RCODE
----------|------------|------------|----------|--------------|--------------|---------------------
 225379922|   225355676|        3101|         0|08/19/03 01:24|08/19/03
01:24|
 225379922|   225355676|        3723|         0|08/19/03 01:13|08/19/03
01:13|
 225379922|   225355676|        4719|         0|08/19/03 05:46|08/19/03
05:46|
 225379922|   225355676|        4925|         0|08/19/03 03:35|08/19/03
03:35|
 225379922|   225355676|        5676|         0|08/19/03 00:33|08/19/03
00:33|
 225379922|   225355676|        8645|         0|04/10/04 09:55|04/10/04
09:55|
 225379922|   225355676|       12872|        0|08/19/03 00:21|08/19/03
00:21|
 225379922|   225355676|       14602|        0|08/19/03 00:17|08/19/03
00:17|
 225379922|   225355676|      134569|        0|12/09/03 17:32|12/09/03

17:32|

The reason for this posting is that I have the following questions/concerns:
1. When I run the query using either IOT, the IO is substantially less when

the cardinality is low..this is because an INDEX RANGE SCAN is used... but

When the cardinality is high (3 million rows), the IO is higher than just using a HEAP setup.

Now, most of the queries in our system select on average 300K records, so maybe in the

end INDEX RANGE SCANS will be used and IO will be returned.? opinions?

2. I'm undecided whether keeping the DATE_CONF field in or out of the OVERFLOW segment?

I'm not really sure what the rule is here... simply that the DATE_CONF field is sometimes

used as a filter, but most of the time (70% of time) it's not used in the predicate? opinions?

3. Finally, when using the IOTs, and pulling 3MM records, the optimizer decides to

execute an INDEX FULL SCAN on the secondary index on the IOTs .. I understand

that this removes the need to sort, but is that the most efficient way of doing it???

Here are the test results from running the above query selecting both 45K records and

3MM on all 3 object types. ( This is with only 1 user in the database and flushing the

shared_pool and buffer_cache after every execution...

Also the queries run with dynamic_sampling=4 because it's the only way we get decent cardinality estimates with the function MOD function .. since this function takes any arguments are customers feel like sending it.

TABLE: LINK_225355672

ACCESS: FULL TABLE SCAN ON  LINK_225355672
PHY_IO:      38015
LOG_IO:      38218

CPU_SEC: 10.97
ELAP_SEC: 18.88
ROWS: 45047
SORTS: 10 MEMORY SORTS (NO DISK SORTS). TABLE: LINK_225355672_IOT (DATE_CONF NOT IN OVERFLOW SEGMENT) ACCESS: INLIST ITERATOR - INDEX RANGE SCAN ON LINK_225355672_IOT_PAUC_PK

PHY_IO: 583        <<< way less io
LOG_IO: 1131      <<< way less io
CPU_SEC:  1.96  <<< much faster

ELAP_SEC: 1.81 <<< much faster
ROWS: 45047
SORTS: 6 MEMORY SORTS (NO DISK SORTS). TABLE: LINK_225355672_IOT2 (DATE_CONF IN OVERFLOW SEGMENT) ACCESS: INLIST ITERATOR - INDEX RANGE SCAN ON LINK_225355672_IOT2_PAUC_PK

PHY_IO: 357
LOG_IO: 362
CPU_SEC:  1.90

ELAP_SEC: 1.87
ROWS: 45047
SORTS: 6 MEMORY SORTS (NO DISK SORTS). .... now the same query and this time pulling 3 million records.

TABLE: LINK_225355672

ACCESS: FULL TABLE SCAN ON  LINK_225355672
PHY_IO:      48873   <<<  not bad
LOG_IO:      38221   <<<  not bad

CPU_SEC: 172.40
ELAP_SEC: 174.91
ROWS: 3144801
SORTS: 9 MEMORY SORTS + 1 DISK SORT TABLE: LINK_225355672_IOT2

ACCESS: INDEX FULL SCAN ON  LINK_225355672_IOT2_UP (secondary index)
PHY_IO:      52118    <<<  not as good as above
LOG_IO:      296771  <<<  not as good as above

CPU_SEC: 142.17
ELAP_SEC: 143.11 <<< still slightly faster than above ROWS: 3144801
SORTS: 5 MEMORY SORTS.

Other IOT is very similar to the above test.

So if on average all my queries select between say 10K and 300K records, then an INDEX RANGE SCAN will likely be used cutting down on IO significantly... but it will generate more IO when queries get to be in the millions.. plus it will use the secondary index .. which I'm still not sure
why it does that... The other thing I'm still uncertain about is if I should have an OVERFLOW segment and if it should include the DATE_CONF column..?

You guys have been very helpful in the past, and any feedback would be greatly appreciated.
Thanks.
-peter Received on Tue Mar 07 2006 - 12:57:52 CST