Oracle FAQ Your Portal to the Oracle Knowledge Grid
HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US
 

Home -> Community -> Usenet -> c.d.o.server -> Performance of joins using BETWEEN

Performance of joins using BETWEEN

From: Petr Schmidt <petr.schmidt_at_home.com>
Date: Mon, 19 Feb 2001 04:02:32 GMT
Message-ID: <sV0k6.5198$Nh7.357524@news1.rdc1.ab.home.com>

Hi,

we have a DW, which contains Oracle Financials information. One of our more complex ETL processes transforms GL account rollups. These transforms have taken a very disjoint jump in run times as more rollup data is being added to the source system. The issue appears to be with SQL, which uses <= and >= (BETWEEN) to join data from 2 tables. A very simplified version of the problem can be demonstrated as follows:

There is a table with child account segments called TEMP_DETAIL.

Name            Type
FLEX_VALUE      VARCHAR2(150) <- PK (child account segment)
DESCRIPTION     VARCHAR2(240)

SUMMARY_FLAG VARCHAR2(1)
ENABLED_FLAG VARCHAR2(1) For our example it contains approx 31,000 rows.

A second table contains parent account segments with child ranges and rollup levels. It is called TEMP_HL.

Name                   Type
FLEX_VALUE             VARCHAR2(150) <- parent accounts
DESCRIPTION            VARCHAR2(240)
CHILD_FLEX_VALUE_LOW   VARCHAR2(60)  <- child from range
CHILD_FLEX_VALUE_HIGH  VARCHAR2(60)  <- child to range
HIERARCHY_LEVEL        VARCHAR2(30)  <- rollup level

The TEMP_HL table started off with about 300 rows. The number of rows (parents) increased to about 7,000 just for level 1 parents alone. And that's when significant performance problems have occurred.

Each child can have 0 or more parents. To find all parents for a given child a following query is used.

SELECT /*+ ORDERED USE_MERGE(td hl1) */

       DISTINCT
       td.flex_value            project,
       td.description           project_description
FROM   temp_detail  td,
       temp_hl      hl1

WHERE td.flex_value >= hl1.child_flex_value_low(+)

   AND td.flex_value <= hl1.child_flex_value_high(+)    AND hl1.hierarchy_level(+) = '1'

With 70 rows in TEMP_HL table the query executes in about 16 seconds. With 7,000 rows in TEMP_HL table the query takes about 4 minutes to execute. I've added an index on child_flex_value_low, child_flex_value_high and hierarchy_level to TEMP_HL, which cut the processing time to about 2 minutes. Parallelizing the processing using /*+ PARALLEL(td,4) USE_HASH(td hl1) */ resulted in about 1 minute run time. I've tried number of other options such as IOT, other bitmap and B-tree indexes, but none of them appear to improve on the 1 minute runtime.

The actual ETL SQL is much more complicated as we use up to 7 parent rollup levels. So a 400% increase (16 to 60 seconds) in the most granular statement compounds to substantial degradation in overall transform performance. I've also determined that the DISTINCT and outer joins have fairly negligible impact on the run time compared to the BETWEEN join.

The issue is the number of times TEMP_HL table has to be range scanned while looking for the FROM/TO child ranges. Assuming fairly uniform distribution 50% of the TEMP_HL table has to be scanned for each row (child) in TEMP_DETAIL table. I'd appreciate any ideas on how to optimized this type of join.

We are running 8.1.6 on NT.

Thanks
Petr Schmidt Received on Sun Feb 18 2001 - 22:02:32 CST

Original text of this message

HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US