Re: Re: How to minimise sorting time in a query

From: <l.flatz_at_bluewin.ch>
Date: Mon, 7 Nov 2022 08:48:21 +0100 (CET)
Message-ID: <708843465.5906.1667807301482_at_bluewin.ch>





Quote "Oracle now know the max and min value of each index partition, so the sorting it will have to perform just on the partition level and then just club them all without a need of bigger sorting area for sorting whole data. Please , correct me if my understanding is wrong."You are correct. Oracle understands that a range partition is sorted. Thanks
Lothar
----Ursprüngliche Nachricht----
Von : loknath.73_at_gmail.com
Datum : 06/11/2022 - 13:56 (MN)
An : jlewisoracle_at_gmail.com, gogala.mladen_at_gmail.com Cc : oracle-l_at_freelists.org
Betreff : Re: How to minimise sorting time in a query  Thank you so much Jonathan and Mladen.    

Currently the table is a yearly range partitioned on DT_CREATE column and it has data for last ~8 years so total ~8 partitions. Also the table has a primary key in column (c1,c2,DT_CREATE). And we have another b-tree index present on column C3. All are local index only. And the combination of (c1,c2) is unique only.    

As I tested on a sample table on dev , the sorting operation is fully eliminated by the optimizer only in the case of the IOT i.e. index organized table with Pk on (c1,c2). Below is the sample plan I tried to fetch from dev. I hope you are suggesting a non-partitioned index organized table here.    

In the plan below, index PN_PK is the primary key on (c1,c2) and the table is a non-partitioned index organized table/IOT. And I see the table access is fully eliminated here even if the column c3 is not part of the primary key index. So it's because it's now IOT, so no extra table access is needed even if that column is not part of the PK index.     

  Execution Plan



Plan hash value: 27212239

| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |


| 0 | SELECT STATEMENT | | 1 | 40 | 2 (0)| 00:00:01 |
| 1 | INDEX FULL SCAN | PN_PK | 1 | 40 | 2 (0)| 00:00:01 |

     

And also when you mentioned "
  and you have to worry about the impact on data loading. If you need other indexes on the table you then have to worry about the effects on those indexes (for both loading and querying)." I was not able to fully understand. Is this something as below..      

With regards to the other b-tree index on column C3 which we already have on prod , that will have higher clustering factor now, as because the data in this IOT table is now strictly ordered based on the other column (c1,c2). And that will cause higher IO while querying data through the C3 column index access too because of higher clustering factor. Not able to figure out how data load now has to do more work on this C3 index. However, with regards to the table "data load", We are inserting ~3-4million new rows every day to this table. The data will now need to be ordered/sorted by column (c1,c2) while getting inserted into the IOT , so here oracle has to do more work., Is this the correct explanation or negative consequences of the IOT approach , you wanted us to be aware about?      

Never thought of the design change which Jonathan suggested by making the index globally partitioned by the leading column. Though i am not able to run the query with data but as i tested with a sample table the plan looks like below, which means as Oracle now know the max and min value of each index partition, so the sorting it will have to perform just on the partition level and then just club them all without a need of bigger sorting area for sorting whole data. Please , correct me if my understanding is wrong.      

Table is the yearly range partitioned on column dt_create as it is in current production. Index IDX12 on (c1,c2,c3) with global range partitioned on column c1.       

Execution Plan



Plan hash value: 2524364129

| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |


| 0 | SELECT STATEMENT | | 5666 | 199K| 13 (8)| 00:00:01 | | |
| 1 | PARTITION RANGE ALL | | 5666 | 199K| 13 (8)| 00:00:01 | 1 | 6 |
| 2 | SORT ORDER BY | | 5666 | 199K| 13 (8)| 00:00:01 | | |
| 3 | INDEX STORAGE FAST FULL SCAN| IDX12 | 5666 | 199K| 12 (0)| 00:00:01 | 1 | 6 |

          

  On Sun, Nov 6, 2022 at 2:51 PM Jonathan Lewis <   jlewisoracle_at_gmail.com> wrote:                                   

     You table is range partitioned - what's the partitioning column?          

     Do the table have a primary key?                          

     You want to sort by c1, c2 - if you're not range partitioned by c1 you will HAVE to do a sort to get the data in order.          

     As Mladen suggests, the best option for your requirement would be to turn the table into an IOT, but that's only possible if the table has a primary key, and then you need the primary key to start with (c1,c2) if you want to avoid a sort, and you have to worry about the impact on data loading. If you need other indexes on the table you then have to worry about the effects on those indexes (for both loading and querying).
     
    
    
     
    
    
     Your second best option might be to create an index on c1, c2, c3, globally partitioned by range on c1, then hint the query to do a full scan on the index - and not allow parallelism.  This requires a MAXVALUE partition on C1, so you may need to put in place a mechanism to keep the MAXVALUE partition empty and split it on a regular basis so that you don't end up with one enormous partition some time in the future.
     
    
    
     
    
    
     Regards
    
    
     Jonathan Lewis
    
    
     
    
   
   
   
    
     On Sat, 5 Nov 2022 at 20:18, Lok P <
     loknath.73_at_gmail.com> wrote:
     
    
    
     
      Hello Listers, In a 19C  Oracle database. One requirement is to just pick all the records sorted by two columns from a lookup table. The table holds ~1billion rows. And this data needs to be cached daily 8-10 times as it's getting used by an ETL workflow for lookup and it's queried in sorted fashion from the database by that ETL workflow. The issue is that the majority of time is spent while sorting those ~1billion rows each time. Below is the sample sqlmonitor of the sql query. We wanted to understand what best we can do to minimize the response time of this query so that ETL workflow time can be reduced? 
      
      
It's already using ~1GB of memory and ~30GB of temp. The rows are expected to grow 2billion+. So setting manual memory management and sort_area_size as 2GB may not help in the long run. Tried running in parallel(4) and parallel(8) but then no relief in the amount of temp spill, rather the response time seems to be increasing further. Is there any other possible way to reduce the execution time of this query?
      
      
The table has three columns with below data structure/data types.
      
      
C1   VARCHAR2 (34 Byte)
      
C2 VARCHAR2 (1 Byte)
      
C3 VARCHAR2 (34 Byte)
      
CREATE_BY      VARCHAR2 (50 Byte)
      
DT_CREATE   DATE
      
      
       
      
      
       
SELECT C3,c1,c2 FROM TAB1 ORDER BY c1,c2;
      
     
    
   
  
 


--
http://www.freelists.org/webpage/oracle-l
Received on Mon Nov 07 2022 - 08:48:21 CET

Original text of this message