Re: Unique index access path seems very slow

From: Jonathan Lewis <jlewisoracle_at_gmail.com>
Date: Wed, 25 Jan 2023 19:27:49 +0000
Message-ID: <CAGtsp8mrsyzobD47RzBW+uPc0cFdT-tMAP7hS7z0BQYXp70PQw_at_mail.gmail.com>



I think yudhi s and Andrew Sayer have answered your questions, but just in case:

1)
The only thing I can think of that would make anyone think that the original raw open to inconsistency when compared with storing the value as a hex string is some confusion about "little endian" vs. "big endian" where the order of bytes in (e.g.) a 4 byte number get swapped around. This ought to be irrelevant with a basic byte array.

The reason why querying a raw gives you the same display as querying the hextoraw() is that SQL*Plus (and Toad) implicitly conver the REPRESENTATION to hex for display purposes. See what you get when you try to push "unlucky" byte values at the terminal, for example on a linux terminal screen:

    select 'A' || chr(9) || 'B' from dual; The byte-value 9 when sent to a terminal is the TAB character. Other values do nastier things to the display.

2)
If the development team thinks that a HEX string consists of digits 1-9 and characters A - Z then it's not surprising that you have so much trouble getting anywhere with optimising the design. HEX is short for hexadecimal, which means "base 16" and represents numbers using 16 "digits", 0 - 9 and A - F; so each byte value (0 - 255, or in binary 00000000 - 11111111) can be represented as two digits 00 - FF.

The only comment I can make about the rest of the parapgraph and the one that follows it is that there's no point in commenting on it until you can give an accurate description of what you've got and what the development team think they're going to do. how they think it will work, and they think it will help.

Regards
Jonathan Lewis

On Tue, 24 Jan 2023 at 18:40, Pap <oracle.developer35_at_gmail.com> wrote:

> Thank You So much Jonathan for the in depth details.
>
> Yet to go through all the suggested options in detail. However, wanted to
> share few quick things ,
>
> 1) On the RAW vs HEX value. As we checked with the dev team we got the
> reason for keeping the value in the hex format as below..
> "*we Use RAWTOHEX() as output so that it will give consistent result
> irrespective of client being used. This will convert the output from RAW to
> VARCHAR2 so it works with all clients (SQL, Perl, etc.) *".
>
> Now, we are still trying to understand how the RAW data type format output
> will differ or vary based on the client type. I tried running the sample
> SHA512 query on "sql plus" client and "toad" and both have the same values
> though.
>
> Also another point, You mentioned the RAW output of the SHA512 function
> would be 64 bytes but as i executed below two queries, i noticed with and
> without "RAWTOHEX" the SHA512 gives exactly the same 128bytes output only.
> Doesn't it mean that there will be no difference in size of that
> column/index even if we store the RAW value of the SHA512?
>
> select (STANDARD_HASH (TO_CHAR (TRIM ( '225057')),'SHA512')) from dual ;
> 57AAF4B657557BAD68054B38C40149E2D8BD1EEB46A346DD6AC248228F55
> 330B0832E8A1997855782811F0FA4C363FAB36597824157C3E10EAFFB0F07F61B666
>
> select (RAWTOHEX(STANDARD_HASH (TO_CHAR (TRIM ( '225057')),'SHA512')))
> from dual;
> 57AAF4B657557BAD68054B38C40149E2D8BD1EEB46A346DD6AC248228F55
> 330B0832E8A1997855782811F0FA4C363FAB36597824157C3E10EAFFB0F07F61B666
>
> 2) Another option team is suggesting, to list partition the table
> tab_encrypt by column substr(COL1_SHA_512,-1). The last digit of the
> column will be either letter 'A' to 'Z' or number '1' to '9' , so there
> will be a maximum of 26 letters +9 digits =35 list partitions. And then
> derive the last digit of the column COL1_SHA_512 from the inline view by
> using substr function on the column of tran_tab and use that as a filter in
> the outer query. So that partition pruning will happen for table tran_tab
> and even in case of indexed path , it will go for the unique index scan
> with a smaller index segment. Also just to note, the transaction table
> tran_tab is ~25TB in size holding ~210 daily range partitions and is
> heavily used and any design change on this needs to be evaluated for impact
> on all application queries.
>
> And yes, the tran_tab.pd_cd is actually going to filter out ~400 million
> out of a billion+ rows in the table tran_tab. and this test which we are
> doing for ~1hr worth of date range, is to consider in case we have some
> backlog to catch up during system down time etc. It's using the standard 8K
> block size and it doesn't have HCC for the data which we read in this
> query. Only historical partitions are HCC compressed for tran_tab.
>
>
> On Tue, 24 Jan, 2023, 8:24 pm Jonathan Lewis, <jlewisoracle_at_gmail.com>
> wrote:
>
>>
>> It doesn't matter what "the other systems" do, the "RAW" and "HEX" are
>> representations of the same value. You ought to halve the size of the index
>> and reduce the size of the table by a significant percentage. If the
>> downstream systems want to see a hex representation and don't want to query
>> the conversion for themselves then create a virtual column that represents
>> the raw as a hex string.
>>
>> The I/O rate doesn't appear to be a problem, but it's a little hard to
>> determine without a wait event histogram. The nested loop plan I'm looking
>> at says you did 815K (463K + 351K) requests for a single block inside an
>> interval of 1,589 (1591 - 2) seconds, which is 513 I/O requests per second
>> (and there was other activity going on at the same time). If you want a
>> better picture of the actual I/O waits you'll probably have to enable SQL
>> tracing for a whle then examine the trace file. (I'm assuming, by the way,
>> that the database is using the standard 8KB block size - and I'm also
>> assuming that neither table is uses HCC.)
>>
>> Your "similar query" was probably not similar enough to tell you anything
>> useful, but we can't tell because all yo u showed us a an execution plan
>> (and that showed your 2nd table was partitioned and you were using a local
>> index to access it, so not similar at all).
>>
>> You big index is running a little less efficiently than I might expect
>> roughly 5,000 bytes used per block rather then a typical 5,600 / 70%) but
>> that may be an inidilcation that you rebuilt it some time in the not too
>> distant past and have a large number of blocks that have gone through 50/50
>> spilts and are still moving towards a stable state. That's not your big
>> issue, though.
>>
>> Basically, as Andy pointed out, your index is big and the standard hash
>> means you are inserting and querying blocks in a completly random pattern
>> across the entire range of the index. Do some arithmetic - you're after 3M
>> rows (from tran_tab), which means 1 row in 1,000 from encrypt_tab. Your
>> index entries are about 140 bytes (128 plus a little overhead), and you're
>> getting about 36 entries per leaf block. That means "on average" you're
>> looking for one row out of every 28 index leaf blocks (and it's a little
>> worse of the table) - the probability of being able to take advantage of
>> any previously cached blocks is pretty low - so if you want to estimate the
>> workload if your query uses a nested loop, it's likely to in the ballpark
>> of the driving tablescan time plus 2 disk reads (one leaf block one table
>> block) per row supplied by tran_tab. (There's a note about thinking at
>> scale here: https://jonathanlewis.wordpress.com/2007/03/18/thinking-big/
>> )
>>
>> An important question then is: what's the significance of the
>> tran_tab.pd_cd (which has a redundant outer join symbol in your example) -
>> does it mean that of the final 400M rows you're actually only going to
>> movea small percentage, or is it a flag with a small number of values and
>> each value directs data to a different system, or is it just a way of
>> making a high volume query a lower volume query, or what .....
>>
>> Also: why are you testing with a query for an hour range? If you want to
>> run every 5 minutes should this be a 5 minute range (which would give an
>> average of 1.4M rows ... is this a "worst case" test or is there some other
>> significance.
>>
>>
>> Fundamental and obvious strategy: store raw, not hex
>>
>> Suggestion to be tested: if you hash partition encrypt_tab on the sha512
>> column, and rebuild tran_tab as a composite partitioned table where the
>> second dimension is also hash partitioned on the (virtual) sha512 column
>> with the same number of (sub)partitions then Oracle should be able to do a
>> partial partiton-wise join between the two and a series of small(-ish) hash
>> joins may be much more efficient that one huge hash join. (If the path is
>> a nested loop then a local index MIGHT see better caching on a partial
>> partition-wise join.)
>>
>> Secondary suggestion: (probably worth only a small amount of CPU time.
>> It looks like you have a date-only column for the partition key column, so
>> I'm guessing the table is range partitioned (perhaps with an interval
>> clause of one day). With this setup the predicate pt_date = {constant} has
>> to be applied to every rows you visit because there may be rows (from
>> Oracle's perspective) which have a non-midnight time component. If you
>> change the table to list-partitioned with exxactly one date(-only) value
>> defining each partition then Oracle could recognise that this predicate was
>> exactly ALL the rows in the partition and not need to test each row. I
>> think your version of Oracle even allows automatic list partitioning so
>> that a new list partition would be created automatically when a new data
>> appeared.
>>
>> Curiosity suggestion: You have about 3 billion rows in encrypt_tab, and 2
>> billion of them survive the bloom filter. Of course that might mean
>> anything between 1.5 billion and 2.5 billion. However, the tablescan
>> reports 135 seconds (samples) of CPU, ,and that MIGHT be the cost of
>> applying the Bloom filter to every single row. It's possible (though I
>> think a little unlikely) that if you disable the Bloom filtering the CPU
>> utilisation will drop (the outline shows you the Bloom filter hint -
>> px_join_filter - add this to the query but change it to
>> NO_px_join_filter). You may find, anyway that if you add a cardinality (or
>> OPT_ESTIMATE) hint to the query to tell it that tran_tab will supply 3M
>> rows the Bloom filter may simply not appear anyway. I'd have to mess about
>> a bit to get it right (and the hint_report option to
>> dbms_xplan.display_cursor() might help) but it would probably be something
>> like: cardinality(_at_sel$f5bb74e1 tran_tab_at_sel$2 3000000)
>>
>>
>> Regards
>> Jonathan Lewis
>>
>>
>>
>>
>>

--
http://www.freelists.org/webpage/oracle-l
Received on Wed Jan 25 2023 - 20:27:49 CET

Original text of this message