Re: How to minimise sorting time in a query

From: Jonathan Lewis <jlewisoracle_at_gmail.com>
Date: Mon, 7 Nov 2022 11:13:01 +0000
Message-ID: <CAGtsp8kzVUQ5tXAnTdVGfDbmDYB8cJAfmrXfpkiR3EGxqWA=DQ_at_mail.gmail.com>



 >> And also when you mentioned "*and you have to worry about the impact on data loading. If you need other indexes on the table you then have to worry about the effects on those indexes (for both loading and querying).*" I was not able to fully understand. Is this something as below..

Without knowing anything about your table, column and index definitions it's not possible (and could be counter-productive) to make specific suggestions, hence the need for generic warnings that leave it up to you to apply the extra information you have. The thoughts you've had about the effect of the costs and benefits of setting the table up as an IOT are essentially correct - it's all driven by the fact that

  1. when you load a heap table the table data goes into the next few allocated blocks, densely packed and Oracle has some scope for optimising the index maintenance,

compared to

b) when you load a heap table the data tends to end up scattered over a large number of ("leaf") blocks of the IOT, and then the primary key and block address of each row has to be used in the secondary index

Another factor in the querying by secondary index is that Oracle uses the stored block address as a "guess" of where the row will be in the IOT, and if the guess is wrong it then uses the stored primary key to find the row. When the data is constantly growing the guesses often become wrong fairly frequently. (See
https://jonathanlewis.wordpress.com/2019/12/16/iot-bug/#footnote - which also describes a bug in secondary indexes that I haven't checked for the latest versions of Oracle, though it does appear in 19.3).

Since you want to partition by create_date the globally partitioned index on (c1,c2,c3) is probably the best bet. You've said that (c1,c2) is unique (and the primary key) - so for the heap table solution you could create the globally partitioned index on (c1, c2, c3) as a non-unique index and then associate the primary key with it then drop the old unique index.

You've shown a plan where Oracle did a partition-wise index fast full scan - that was just a costing thing, if you hint the query to use an index but NOT an index_ffs() then it should be able to read the index in order.

As Andrew Sayer points out, though, dragging 2 billion rows across the network 10 times per day is an extreme thing to do to satisfy your ETL tool. Insisting on having them sorted is even worse. An incremental mechanism might be a lot better if the tool will accept it, or if you have to drag all the data across then using a tablescan and letting the ETL tool do the sorting might be the best bet. In passing, I think we've noted in the past that C1 is a character representation of a fixed width number - so yet another optimisation detail (for network traffic) might be to pass a to_number(c1) to the ETL tool (which, who knows, might be converting the incoming character to numeric anyway).

Regards
Jonathan Lewis

On Sun, 6 Nov 2022 at 12:57, Lok P <loknath.73_at_gmail.com> wrote:

> Thank you so much Jonathan and Mladen.
>
> Currently the table is a yearly range partitioned on DT_CREATE column and
> it has data for last ~8 years so total ~8 partitions. Also the table has a
> primary key in column (c1,c2,DT_CREATE). And we have another b-tree index
> present on column C3. All are local index only. And the combination of
> (c1,c2) is unique only.
>
>

--
http://www.freelists.org/webpage/oracle-l
Received on Mon Nov 07 2022 - 12:13:01 CET

Original text of this message