Re: How to minimise sorting time in a query

From: Lok P <loknath.73_at_gmail.com>
Date: Mon, 7 Nov 2022 22:14:54 +0530
Message-ID: <CAKna9VYJgp26QwP2rfgLt6=25yaf62NGhQV3SdUV2jn+hWZtBw_at_mail.gmail.com>



Thank you Jonathan, Andy, Lothar.

"*Since you want to partition by create_date the globally partitioned index on (c1,c2,c3) is probably the best bet. You've said that (c1,c2) is unique (and the primary key) - so for the heap table solution you could create the globally partitioned index on (c1, c2, c3) as a non-unique index and then associate the primary key with it then drop the old unique index."*

Am I misinterpreting the above? as because the global partitioned index will need to be prefixed , so in that case for the index to be partitioned on create_date, it will ask for create_date to be the leading column something as in below plan and in that case the sorting will happen across all index partitions data.

And though the combination of (C1,C2) is unique , the current PK is on (C1,c2 ,date_create), so adding yet another column c3 to it won't satisfy the PK uniqueness as it was doing before.

create index idx on tab(date_create,c1,c2,c3) global PARTITION BY RANGE (date_create)......;

Execution Plan



Plan hash value: 2815965656

| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |

| 0 | SELECT STATEMENT | | 5666 | 199K| 14 (8)| 00:00:01 |
 | |
| 1 | SORT ORDER BY | | 5666 | 199K| 14 (8)| 00:00:01 | |
   |
| 2 | PARTITION RANGE ALL | | 5666 | 199K| 13 (0)| 00:00:01 |
1 | 5 |
| 3 | INDEX STORAGE FAST FULL SCAN| IDX12 | 5666 | 199K| 13 (0)|
00:00:01 | 1 | 5 |

In regards to ETL logic, it picks the data from this database and cache it and try to use this as lookup while reading data from other database tables and filter it through. The ETL tool has some restrictions due to which it cant persist the data so as to use the same sorted cached data throughout the day across multiple workflows. And thinking of joining the data across the databases using DB links, they are not allowed in this environment here due to some security issues. and yes, C1 is a character representation of number like data which is why posing a challenge to range partition through it in equal intervals. We tested earlier fetching all data unsorted and sorting it in ETL was taking a lot more time as compared to it does on the DB side.

On Mon, Nov 7, 2022 at 4:43 PM Jonathan Lewis <jlewisoracle_at_gmail.com> wrote:

> >> And also when you mentioned "*and you have to worry about the impact
> on data loading. If you need other indexes on the table you then have to
> worry about the effects on those indexes (for both loading and querying).*"
> I was not able to fully understand. Is this something as below..
>
> Without knowing anything about your table, column and index definitions
> it's not possible (and could be counter-productive) to make specific
> suggestions, hence the need for generic warnings that leave it up to you to
> apply the extra information you have. The thoughts you've had about the
> effect of the costs and benefits of setting the table up as an IOT are
> essentially correct - it's all driven by the fact that
>
> a) when you load a heap table the table data goes into the next few
> allocated blocks, densely packed and Oracle has some scope for optimising
> the index maintenance,
>
> compared to
>
> b) when you load a heap table the data tends to end up scattered over a
> large number of ("leaf") blocks of the IOT, and then the primary key and
> block address of each row has to be used in the secondary index
>
> Another factor in the querying by secondary index is that Oracle uses the
> stored block address as a "guess" of where the row will be in the IOT, and
> if the guess is wrong it then uses the stored primary key to find the row.
> When the data is constantly growing the guesses often become wrong fairly
> frequently. (See
> https://jonathanlewis.wordpress.com/2019/12/16/iot-bug/#footnote - which
> also describes a bug in secondary indexes that I haven't checked for the
> latest versions of Oracle, though it does appear in 19.3).
>
>
> Since you want to partition by create_date the globally partitioned index
> on (c1,c2,c3) is probably the best bet. You've said that (c1,c2) is unique
> (and the primary key) - so for the heap table solution you could create the
> globally partitioned index on (c1, c2, c3) as a non-unique index and then
> associate the primary key with it then drop the old unique index.
>
> You've shown a plan where Oracle did a partition-wise index fast full scan
> - that was just a costing thing, if you hint the query to use an index but
> NOT an index_ffs() then it should be able to read the index in order.
>
> As Andrew Sayer points out, though, dragging 2 billion rows across the
> network 10 times per day is an extreme thing to do to satisfy your ETL
> tool. Insisting on having them sorted is even worse. An incremental
> mechanism might be a lot better if the tool will accept it, or if you have
> to drag all the data across then using a tablescan and letting the ETL tool
> do the sorting might be the best bet.   In passing, I think we've  noted in
> the past that C1 is a character representation of a fixed width number - so
> yet another optimisation detail (for network traffic) might be to pass a
> to_number(c1) to the ETL tool (which, who knows, might be converting the
> incoming character to numeric anyway).
>
> Regards
> Jonathan Lewis
>
>
>
>
>
>
>
>
>
> On Sun, 6 Nov 2022 at 12:57, Lok P <loknath.73_at_gmail.com> wrote:
>
>> Thank you so much Jonathan and Mladen.
>>
>> Currently the table is a yearly range partitioned on DT_CREATE column and
>> it has data for last ~8 years so total ~8 partitions. Also the table has a
>> primary key in column (c1,c2,DT_CREATE). And we have another b-tree index
>> present on column C3. All are local index only. And the combination of
>> (c1,c2) is unique only.
>>
>>

--
http://www.freelists.org/webpage/oracle-l
Received on Mon Nov 07 2022 - 17:44:54 CET

Original text of this message