Oracle FAQ | Your Portal to the Oracle Knowledge Grid |
Home -> Community -> Usenet -> c.d.o.server -> Performance of joins using BETWEEN
Hi,
we have a DW, which contains Oracle Financials information. One of our more complex ETL processes transforms GL account rollups. These transforms have taken a very disjoint jump in run times as more rollup data is being added to the source system. The issue appears to be with SQL, which uses <= and >= (BETWEEN) to join data from 2 tables. A very simplified version of the problem can be demonstrated as follows:
There is a table with child account segments called TEMP_DETAIL.
Name Type FLEX_VALUE VARCHAR2(150) <- PK (child account segment) DESCRIPTION VARCHAR2(240)
A second table contains parent account segments with child ranges and rollup levels. It is called TEMP_HL.
Name Type FLEX_VALUE VARCHAR2(150) <- parent accounts DESCRIPTION VARCHAR2(240) CHILD_FLEX_VALUE_LOW VARCHAR2(60) <- child from range CHILD_FLEX_VALUE_HIGH VARCHAR2(60) <- child to range HIERARCHY_LEVEL VARCHAR2(30) <- rollup level
The TEMP_HL table started off with about 300 rows. The number of rows (parents) increased to about 7,000 just for level 1 parents alone. And that's when significant performance problems have occurred.
Each child can have 0 or more parents. To find all parents for a given child a following query is used.
SELECT /*+ ORDERED USE_MERGE(td hl1) */
DISTINCT td.flex_value project, td.description project_description FROM temp_detail td, temp_hl hl1
AND td.flex_value <= hl1.child_flex_value_high(+) AND hl1.hierarchy_level(+) = '1'
With 70 rows in TEMP_HL table the query executes in about 16 seconds. With 7,000 rows in TEMP_HL table the query takes about 4 minutes to execute. I've added an index on child_flex_value_low, child_flex_value_high and hierarchy_level to TEMP_HL, which cut the processing time to about 2 minutes. Parallelizing the processing using /*+ PARALLEL(td,4) USE_HASH(td hl1) */ resulted in about 1 minute run time. I've tried number of other options such as IOT, other bitmap and B-tree indexes, but none of them appear to improve on the 1 minute runtime.
The actual ETL SQL is much more complicated as we use up to 7 parent rollup levels. So a 400% increase (16 to 60 seconds) in the most granular statement compounds to substantial degradation in overall transform performance. I've also determined that the DISTINCT and outer joins have fairly negligible impact on the run time compared to the BETWEEN join.
The issue is the number of times TEMP_HL table has to be range scanned while looking for the FROM/TO child ranges. Assuming fairly uniform distribution 50% of the TEMP_HL table has to be scanned for each row (child) in TEMP_DETAIL table. I'd appreciate any ideas on how to optimized this type of join.
We are running 8.1.6 on NT.
Thanks
Petr Schmidt
Received on Sun Feb 18 2001 - 22:02:32 CST