Re: Performance issue on query doing smart scan

From: Jonathan Lewis <jlewisoracle_at_gmail.com>
Date: Fri, 29 Jan 2021 10:51:38 +0000
Message-ID: <CAGtsp8ndex_94zMB9F5sk3RYzLijY0xYUQMn73uaiHAcMX7qtg_at_mail.gmail.com>



 *Actually the partition column I_DT is on TAB2 which is part of another with clause query but full scanning that table is not the problem here as we see from the sql monitor. It's the scanning of TAB1 which is partitioned on PART_DT is causing the issue. And we do have an explicit filter on this column for partition pruning, but anyway our requirement is to scan across all the ~180 partitions only.*

Given that you have a table TAB1, which you've included in a CTE that you've aliased TAB1, which then gets an alias of "a" in the main query; with an equally bad appalli1ng n1aming convention involving TAB2 it's not entirely surprising that I managed to cross over the two names when working through your problem. Do you really have such bad coding standards?

The point remains, however you said that you scanned all the partitions, and that the partitioning column was part_dt, and the SQL you sent us originally has the predicate "AND a.part_dt BETWEEN :1 AND :2". BETWEEN is a very bad option for a predicate on a partition key.A partition contains data that is strictly less than the upper boundary, and greater than or equal to the upper bound of the previous partitions, so the ideally range-based predicate should be of the form:

   part_dt >= {date1} and part_dt < {date2} If you do this than Oracle may be able to discard the predicate (i.e. not waste CPU testing it) if it can deduce that every row in every relevant partition is going to satisfy the predicate. This is particularly interesting for EXADATA with compressed tables, since it MAY mean (and I can't check it) that Oracle won't have to decompress the column to check the value, and this MIGHT have a noticeable benefit for some of the compression options in terms of CPU.

Index compression and table compression have nothing to do with each other. Also the "much longer running" of the query when forced through
(AANUM, TXID) isn't about the large number of distinct values of TXID, it's
about the relatively small number of distinct values of AANUM or, to be more accurate, the large number of rows for each value of AANUM that are now being acquired by single block access.

Your thoughts on compression do make sense as a general strategy for reducing storage space and increasing the probability of caching index leaf blocks. Since you have 173 million distinct value for AANUM in a table with 56 Billion rows then on average each value of AANUM has about 330 related rows and if this table were not partitioned you would automatically say that this is a good index for compression. BUT this is a local index on a table with 180 partitions, so each partition MIGHT only have one or two index entries for each value of AANUM in which case you probably wouldn't want to compress it; on the other hand perhaps the AANUM has values which correlate with time, so maybe any one value of AANUM appears in only about 10 partitions, in which case each partition would average 33 rows per AANUM and the index would be worth compressing.

Regarding index columns:

  • WHERE a.aanum = stage.anum AND a.did = stage.did AND a.d_ind = 'Y' AND a.o_id LIKE CASE WHEN stage.al <http://stage.al> = 'XXXX' THEN stage.o_id ELSE a.o_id END AND a.part_dt BETWEEN :1 AND :2*
  • WHERE TAB1.anum(+) = stage.anum AND TAB1.did(+) = stage.did AND TAB2.anum(+) = stage.anum AND TAB2.did(+) = stage.did AND TAB1.pnm(+) = stage.pnm AND TAB2.pnm(+) = stage.pnm*

I made two mistakes with my advice on (anum, did).

First, I forgot that the final where clause was against CTEs, so I missed the need to have d_ind and o_id in the index if you wanted to avoid visiting the table unnecessarily; secondly I missed the TAB1.pnm predicate that was in the middle of your TAB2 predicates in the final where clause - you'd need to have this in the index as well to avoid visiting the table. Again there's a detail to check in your coding standards - the three TAB1 predicates should have been together to avoid the risk of that mistake happening. (Just to add a positive note, I approve of the "next_table = current_table" arrangement of the predicates, some people would have written "stage.anum = TAB1.anum(+)")

Two more thought on indexing - a predicate like d_ind='Y' suggests the column is a yes/no column.
a) If almost all the data has d_ind='Y' then there may be no significant benefit in having that column in the index. If only a tiny fraction of the data has d_ind='Y' then if you can modify the query, you might be able to create a very small function-based index that would make this query much more efficient while adding very little to the workload. b) The list of relevant columns for the index has gone up to 5 for the "perfect" index - in no particular order: aanum, did, d_ind, o_id, pnm. If you know your data you may be able to decide that a combination of 4 or 3 of these columns is "good enough" to reduce the workload dramatically at query time while not adding too great a maintenance overhead.

Finally a generic warning:
*a.o_id LIKE CASE WHEN stage.al <http://stage.al> = 'XXXX' THEN stage.o_id ELSE a.o_id END *

This is a variant of a far commoner construct which is erroneously written as the model for "give me everything unless the user requests specific rows", viz:
*where colX = nvl(:bind_variable, colX)*
(See: https://jonathanlewis.wordpress.com/2007/01/09/conditional-sql/ )

The "LIKE" is sufficiently different that it's possible the predicate is doing exactly what it's supposed to do, but it's worth mentioning that if o_id is NULL then the predicate evaluates to false (even when stage.al = 'XXXX') and this may not be the intention.

Regards
Jonathan Lewis

--
http://www.freelists.org/webpage/oracle-l
Received on Fri Jan 29 2021 - 11:51:38 CET

Original text of this message