Re: Unique index access path seems very slow

From: Pap <oracle.developer35_at_gmail.com>
Date: Tue, 24 Jan 2023 10:48:50 +0530
Message-ID: <CAEjw_fhr2unENrwAVAbJ-JuinaVUUdHOKjYH3bHcLtOrvUAFdA_at_mail.gmail.com>



Thank You So much Andy and Mladen.

 Just to see if CPU because of HASH function evaluation is really the bottleneck here, I just tried to execute the inline view part of the query to know how much overhead really the HASH function evaluation is putting here. As I see , it finishes in ~40 seconds for ~12million rows which is a lot better I think. So basically it's the outer join which is causing the bottleneck here. And outer join should not have to perform/apply the additional HASH function because those hash values are now readily available from inline view from TRAN_TAB and another row source from TAB_ENCRYPT also holds the hashed value in a column itself. So why is the final reading of data from table TAB_ENCRYPT and outer join with the inline view result so slow here?

https://gist.github.com/oracle9999/d4ffa07a4f499e1114feba55a4417aa9

 Also I realized in the indexed execution path which I posted , I had missed the dt_cr filter for the inline view query. So I reran the indexed path again and updated the details. But again no difference i saw even its a different time i am running the query. ~378K in ~30minutes. It's a lot less as compared to even the FTS path. So there appears to be something wrong with the way index data is stored or distributed in the table TAB_ENCRYPT for column COL1_SHA_512. As I believe it can't be the underlying IO slowness as we have other applications and queries running without any issue on this database. Anyway, I will also try to fetch data using another index on a normal column and will see if 100 IOPS is really the speed, (which I believe is not the normal speed of IO on this box anyway).

 https://gist.github.com/oracle9999/9792a70f748b71c365397c5458d3e9e8

Regarding the question " *is the related data inserted at the same time? That way you can do a hash join of the two recently inserted data sets*." The transaction table TRAN_TAB is getting the data from another process 24/7 throughout the day but this process which matches the encrypt column value and passes it to the other system is different altogether and has dependency with other processes/workflows.

Regards
Pap

On Tue, Jan 24, 2023 at 3:22 AM Mladen Gogala <gogala.mladen_at_gmail.com> wrote:

> On 1/23/23 13:46, Pap wrote:
>
> This database is on 19C of Oracle exadata. Below is the query and the sql
> monitor for both the FULL scan path and index access path. Also I had
> captured the non-zero stats from gv$sesstat for ~10minutes execution for
> each of those runs. The index access path has just processed ~82K in
> 30minutes using the index access path and has still not finished yet. We
> want to make this query finish in ~2-3minutes.
>
> SELECT TRAN_TAB.COL1_SHA_512, TAB_ENCRYPT.TAB_ENCRYPT_COL2.....
> FROM (SELECT RAWTOHEX ( STANDARD_HASH (TO_CHAR (TRIM (
> TRAN_TAB.LN4)),'SHA512')) AS COL1_SHA_512
> FROM TRAN_TAB
> WHERE TRAN_TAB.PD_CD(+) = 'XXX'
> AND TRAN_TAB.PART_DT = to_date(:B1,'DD-MON-YYYY HH24:MI:SS')
> AND TRAN_TAB.dt_cr between to_date(:B2,'DD-MON-YYYY HH24:MI:SS') and
> to_date(:B3,'DD-MON-YYYY HH24:MI:SS')
> ) TRAN_TAB, TAB_ENCRYPT
> WHERE TRAN_TAB.COL1_SHA_512 = TAB_ENCRYPT.COL1_SHA_512(+)
>
> ****** "Full scan" path sql monitor + nonzero results from gv$sesstat from
> initial ~10minutes run*******
>
> https://gist.github.com/oracle9999/b37160097fa8e3929fb66af61ebbf9ed
>
> ***** "Index access" path sql monitor + nonzero results from gv$sesstat
> from initial ~10minutes run*******
>
> https://gist.github.com/oracle9999/9792a70f748b71c365397c5458d3e9e8
>
> The requirement is something as below...
>
> We have two tables TRAN_TAB and TAB_ENCRYPT. Table(TAB_ENCRYPT) is an
> insert only table and holds the encrypted value(COL1_SHA_512) of a
> confidential column/attribute. Each row of the transaction table(TRAN_TAB)
> has to be checked/joined against all the encrypted column values of table
> TAB_ENCRYPT using the above query and the results along with additional
> columns from table tab_encrypt have to be sent to another downstream
> system. This query is supposed to run multiple times in a day so as to send
> the results to the downstream system as real time as possible based on the
> records in the transaction table tran_tab. Currently the plan is to run it
> once in ~5minutes. The table TAB_ENCRYPT holds ~3billion rows. And the
> transaction table TRAN_TAB will have Approx ~400 million rows per
> day/part_dt to get it joined with the encrypted column value- COL1_SHA_512.
>
> As in the sql monitor, the original query was taking ~30minutes to join
> with ~3million rows in the transaction table. And the majority of the
> resources seems to be spent while full scanning the table TAB_ENCRYPT. And
> thus we tried creating a unique index on column "COL1_SHA_512" thinking it
> will be fast, but it appears to be further degrading the performance of
> this query. Want to understand why it's happening this way. Is it because
> the column holds ~128byte values so the index becomes too big to fit in
> cache? and then how can we fix this query?
>
> The column "COL1_SHA_512" , holds the SHA512 encrypted value of another
> column. The sample encrypted values look something like below. The size of
> the index on this column is ~700GB. And the statistics of the index is as
> below. It contains all unique values though but i think because of its
> 128byte length the size of the index becomes this huge. The size of the
> table tran_tab is ~880GB.
>
>
> "99983FB4F3BEAA516BE40B85706338B0D7E5D14D1B2A3C945D5F74947A3E8DA8E714D1761D008715CD46B6C6CDE8B99690F2AE04D97D9KHYU67GBF4589HNFRSDR"
>
>
> *Column 'COL1_SHA_512' statistics:- *NUM_DISTINCT NULLABLE DENSITY
> NUM_NULLS AVG_COL_LEN
> 2987284438 N 0.000000000334752187397831
> 0 129
>
>
> *Unique Index(on column COL1_SHA_512) Statistics:- *INDEX_NAME
> BLEVEL PCT_FREE LEAF_BLOCKS CLUSTERING_FACTOR
> NUM_ROWS
> TAB_ENCRYPT_UK 3 10 83389146
> 3008764650 3008764650
>
> Regards
> Pap
>
>
>
> Pap, your index plan is attempting to read 20 million rows from the outer
> table and join the result to the inner table using nested loops. Yes, Bloom
> filter will eliminate some rows from the outer table but there will still
> be a lot to read. There will be 80295 read requests and 627MB to read from
> the table itself and 152K reads totalling 1GB from the index. Depending on
> the model of your Exadata, cell single block read can be relatively slow.
> You should take a look at your AWR report. If your cell single block read
> average time is in milliseconds, that would explain your problem. Exadata
> was made to speed up full table scan. It doesn't do much for the single
> block reads.
>
> However, 80295 rows out of 20M is a relatively small portion of data. Is
> there a possibility to use index range scan on the outer table?
>
> --
> Mladen Gogala
> Database Consultant
> Tel: (347) 321-1217https://dbwhisperer.wordpress.com
>
>

--
http://www.freelists.org/webpage/oracle-l
Received on Tue Jan 24 2023 - 06:18:50 CET

Original text of this message