Re: SQL going after two date ranges and outer join

From: Steve Mitchell <stevem_at_hdcsi.com>
Date: Mon, 15 Apr 2002 15:53:40 GMT
Message-ID: <8ICu8.3747$jB2.1840659545@newssvr21.news.prodigy.com>

Two reasons:

The data model was not designed with performance in mind. It is highly normalized.
These are typically reports running against an OLTP database rather than a data warehouse.

I'm still trying to find good sources of information for strategies to solve (er, evolve--there's
a lot of inertia here) both of these issues.

--steve

"Ashish Mittal" <mittalashish_at_yahoo.com> wrote in message news:NOCt8.13326$CH1.8804_at_sccrnsc02...
> It would be interesting to know what the queries are and why they require
30
> table joins.
>
> Ashish
> "Steve Mitchell" <stevem_at_hdcsi.com> wrote in message
> news:Ptht8.462$bq.396420129_at_newssvr21.news.prodigy.com...
> > Thanks for the tips on the CBO. Worth looking into for us. For what
it's
> > worth, many of our
> > report queries join up many tables--up to 30. From what I've read,
> joining
> > too many tables can
> > confuse both optimizers. I'd guess that RBO often runs faster here
simply
> > because it usually uses
> > nested loops--which also happens to usually be the right thing to do.
> >
> > --steve
> >
> > "Sybrand Bakker" <postbus_at_sybrandb.demon.nl> wrote in message
> > news:0ur8bu01rthqo5nsmo9blnrrp82gogii5o_at_4ax.com...
> > > On Wed, 10 Apr 2002 15:28:25 GMT, "Steve Mitchell" <stevem_at_hdcsi.com>
> > > wrote:
> > >
> > > >
> > > >Agreed that the correct driving table is being chosen. The problem
is
> > it's
> > > >not using the available
> > > >date index.
> > > >
> > > >where
> > > > (arh.shipdatetime between '01-JAN-02' and '31-JAN-02' or
> > > > arh.shipdatetime between '01-JAN-01' and '31-JAN-01') and
> > > > arh.artrxhdridx = ash.artrxhdridx (+)
> > > >
> > > >..and..
> > > >
> > > >where
> > > > arh.shipdatetime between '01-JAN-01' and '31-JAN-01' and
> > > > arh.artrxhdridx = ash.artrxhdridx (+)
> > > >
> > > >..generate two different plans. (the later using the index on
> > shipdatetime,
> > > >the former does not).
> > > >
> > > >This also confuses me. My AR_TRX_HEADER table need not be directly
> > involved
> > > >in the outer
> > > >join for the plan to change. Consider the following query without an
> > outer
> > > >join, and the
> > > >next query, which contains an outer join on from ar_trx_detail to
> > > >ar_trx_header_ship. In the
> > > >second query, the optimizer abandons use of the date index on the
> > > >shipdatetime column of
> > > >the ar_trx_header table and does a full table access instead.
> > > >
> > > >select
> > > > count(*)
> > > >from
> > > > ar_trx_header_ship ash,
> > > > ar_trx_detail ard,
> > > > ar_trx_header arh
> > > >where
> > > > (arh.shipdatetime between '01-JAN-02' and '31-JAN-02' or
> > > > arh.shipdatetime between '01-JAN-01' and '31-JAN-01') and
> > > > arh.artrxhdridx = ard.artrxhdridx and
> > > > ard.shiphdridx = ash.shiphdridx
> > > >
> > > >..plan is:
> > > >
> > > >SELECT STATEMENT Optimizer=RULE
> > > > SORT (AGGREGATE)
> > > > CONCATENATION
> > > > NESTED LOOPS
> > > > NESTED LOOPS
> > > > TABLE ACCESS (BY INDEX ROWID) OF AR_TRX_HEADER
> > > > INDEX (RANGE SCAN) OF XIE4AR_TRX_HEADER (NON-UNIQUE)
> > > > TABLE ACCESS (BY INDEX ROWID) OF AR_TRX_DETAIL
> > > > INDEX (RANGE SCAN) OF XIF6429AR_TRX_DETAIL (NON-UNIQUE)
> > > > INDEX (UNIQUE SCAN) OF XPKAR_TRX_HEADER_SHIP (UNIQUE)
> > > > NESTED LOOPS
> > > > NESTED LOOPS
> > > > TABLE ACCESS (BY INDEX ROWID) OF AR_TRX_HEADER
> > > > INDEX (RANGE SCAN) OF XIE4AR_TRX_HEADER (NON-UNIQUE)
> > > > TABLE ACCESS (BY INDEX ROWID) OF AR_TRX_DETAIL
> > > > INDEX (RANGE SCAN) OF XIF6429AR_TRX_DETAIL (NON-UNIQUE)
> > > > INDEX (UNIQUE SCAN) OF XPKAR_TRX_HEADER_SHIP (UNIQUE)
> > > >
> > > >..whereas the same query with an outer join produces:
> > > >
> > > >select
> > > > count(*)
> > > >from
> > > > ar_trx_header_ship ash,
> > > > ar_trx_detail ard,
> > > > ar_trx_header arh
> > > >where
> > > > (arh.shipdatetime between '01-JAN-02' and '31-JAN-02' or
> > > > arh.shipdatetime between '01-JAN-01' and '31-JAN-01') and
> > > > arh.artrxhdridx = ard.artrxhdridx and
> > > > ard.shiphdridx = ash.shiphdridx (+)
> > > >
> > > >
> > > >SELECT STATEMENT Optimizer=RULE
> > > > SORT (AGGREGATE)
> > > > NESTED LOOPS (OUTER)
> > > > NESTED LOOPS
> > > > TABLE ACCESS (FULL) OF AR_TRX_HEADER
> > > > TABLE ACCESS (BY INDEX ROWID) OF AR_TRX_DETAIL
> > > > INDEX (RANGE SCAN) OF XIF6429AR_TRX_DETAIL (NON-UNIQUE)
> > > > INDEX (UNIQUE SCAN) OF XPKAR_TRX_HEADER_SHIP (UNIQUE)
> > > >
> > > >
> > > >
> > > >..as for why we don't use the cost based optimizer, the answer is
that
> > I'm
> > > >finding that most of
> > > >our queries run slower with the cost based optimizer than without.
For
> > > >example, in this case,
> > > >the above query without the outer join results in a plan such as:
> > > >
> > > >SELECT STATEMENT Optimizer=HINT: CHOOSE (Cost=403 Card=1 Bytes=21)
> > > > SORT (AGGREGATE)
> > > > HASH JOIN (Cost=403 Card=97424 Bytes=2045904)
> > > > TABLE ACCESS (FULL) OF AR_TRX_HEADER (Cost=17 Card=2730
> > Bytes=30030)
> > > > NESTED LOOPS (Cost=174 Card=173151 Bytes=1731510)
> > > > TABLE ACCESS (FULL) OF AR_TRX_DETAIL (Cost=174 Card=633027
> > > >Bytes=3798162)
> > > > INDEX (UNIQUE SCAN) OF XPKAR_TRX_HEADER_SHIP (UNIQUE)
> > > >
> > > >
> > > >Which is another question I have: why is the cost based optimizer
> making
> > > >such bad decisions? All of
> > > >the tables involved are analyzed. There are indexes on all columns
> > > >involved.
> > > >
> > > >I'm trying to learn more about how the (respective) optimizers make
> their
> > > >decisions about these queries.
> > > >
> > > >Thanks in advance.
> > > >
> > > >--steve
> > > >
> > > >
> > > >
> > >
> > > The issue with the outer join is most likely being caused your
> > > parentheses
> > > Try changing them in
> > > (arh.shipdatetime between '01-JAN-02' and '31-JAN-02') or
> > > ( arh.shipdatetime between '01-JAN-01' and '31-JAN-01') and
> > >
> > > I really can't explain why it does work with one range, but I have
> > > seen the RBO gone wrong in similar cases.
> > >
> > > CBO using wrong plans/indexes:
> > > some basic reasons
> > > - not having adequate recent statistics.
> > > - not having proper histograms on indexed columns, so the selectivity
> > > of any column will be incorrectly determined
> > > - not using optimizer_index_cost_adj (from the top of my head)
> > > - having parallel query enabled, which will often result in full table
> > > scans being cheaper
> > > - an inappropiate db_file_multi_block_read_count
> > > some common solutions
> > > - use optimizer_goal = first_rows (favors using indexes)
> > > - make sure your from list has the driving table *leftmost* (using
> > > RBO, this should be *right*most)
> > > - use hints
> > > - in 8i, use the stored outline concept to capture plans you consider
> > > to be correct.
> > > IMO, there is no reason anymore not to us the CBO
> > >
> > > Oracle has a note on Metalink detailing common CBO misconceptions
> > > Jonathan Lewis has a paragraph in Practical Oracle 8i demonstrating
> > > how you can have trace dump the 'stupid' decisions of CBO.
> > >
> > > Hth a bit
> > >
> > >
> > > Sybrand Bakker, Senior Oracle DBA
> > >
> > > To reply remove -verwijderdit from my e-mail address
> >
> >
>
>
Received on Mon Apr 15 2002 - 10:53:40 CDT