RE: How to minimise sorting time in a query

From: Mark W. Farnham <mwf_at_rsiz.com>
Date: Sun, 6 Nov 2022 08:43:36 -0500
Message-ID: <17ce01d8f1e5$c58f0940$50ad1bc0$_at_rsiz.com>



I was typing this before your last response with the partitioning on create date information arrived on the list.  

This obviously won’t work with your partitioning.  

mwf  

From: oracle-l-bounce_at_freelists.org [mailto:oracle-l-bounce_at_freelists.org] On Behalf Of Mark W. Farnham Sent: Sunday, November 06, 2022 8:34 AM
To: jlewisoracle_at_gmail.com; loknath.73_at_gmail.com Cc: 'Oracle L'
Subject: RE: How to minimise sorting time in a query  

AND, tell us more about the ETL.  

IF the ETL can be arranged to reference certain partitions (ideally, for example, one c1 at a time if c1 one is your range partition), AND you can serialize the ETL executions so that you can reference the correct range partition as TAB1 via synonym,  

SELECT C3,c1,c2 FROM TAB1 ORDER BY c1,c2  

Then tab1 could be a synonym altered to resolve to one of n views  

Select C3,’literally c1 for this particular range’ as c1, c2 from <partition_name of this literal c1 for tab1> order by c2  

Then you could have a local index on c2, c3 on each partition c1 which would not need to reference the table (or partition) at all, since it has all the data it needs in sorted order from that index.  

This, of course, creates a concurrency gate, but they can be overlapped, since once the sql is resolved and executing the synonym latch can be broken without harm (thankyou, Oracle VLDB circa 1992 and Ken Jacobs, RIP, Derry, and Andy for seeing the value of allowing synonyms to evaporate once they have been parsed.)  

So you could  

alter synonym, start etl for c1 value 1 (not waiting for completion)

alter synonym, start etl for c1 value 2 (not waiting for completion)…  

alter synonym, start etl for c1 value n  

But you cannot start another series of etl jobs until value n for c1 has started.  

So that might not be a horrible concurrency gate if you can manage it.  

That’s a lot of dancing, but it does not require you change the internal code for the ETL, just manage its submission, and it completely eliminates the sort.  

This, of course, may or may not be manageable in your particular scenario.  

Please also notice I started with “AND” (not but).  

mwf  

From: oracle-l-bounce_at_freelists.org [mailto:oracle-l-bounce_at_freelists.org] On Behalf Of Jonathan Lewis Sent: Sunday, November 06, 2022 4:21 AM
To: loknath.73_at_gmail.com
Cc: Oracle L
Subject: Re: How to minimise sorting time in a query    

You table is range partitioned - what's the partitioning column?

Do the table have a primary key?  

You want to sort by c1, c2 - if you're not range partitioned by c1 you will HAVE to do a sort to get the data in order.

As Mladen suggests, the best option for your requirement would be to turn the table into an IOT, but that's only possible if the table has a primary key, and then you need the primary key to start with (c1,c2) if you want to avoid a sort, and you have to worry about the impact on data loading. If you need other indexes on the table you then have to worry about the effects on those indexes (for both loading and querying).  

Your second best option might be to create an index on c1, c2, c3, globally partitioned by range on c1, then hint the query to do a full scan on the index - and not allow parallelism. This requires a MAXVALUE partition on C1, so you may need to put in place a mechanism to keep the MAXVALUE partition empty and split it on a regular basis so that you don't end up with one enormous partition some time in the future.  

Regards

Jonathan Lewis    

On Sat, 5 Nov 2022 at 20:18, Lok P <loknath.73_at_gmail.com> wrote:

Hello Listers, In a 19C Oracle database. One requirement is to just pick all the records sorted by two columns from a lookup table. The table holds ~1billion rows. And this data needs to be cached daily 8-10 times as it's getting used by an ETL workflow for lookup and it's queried in sorted fashion from the database by that ETL workflow. The issue is that the majority of time is spent while sorting those ~1billion rows each time. Below is the sample sqlmonitor of the sql query. We wanted to understand what best we can do to minimize the response time of this query so that ETL workflow time can be reduced?

It's already using ~1GB of memory and ~30GB of temp. The rows are expected to grow 2billion+. So setting manual memory management and sort_area_size as 2GB may not help in the long run. Tried running in parallel(4) and parallel(8) but then no relief in the amount of temp spill, rather the response time seems to be increasing further. Is there any other possible way to reduce the execution time of this query?

The table has three columns with below data structure/data types.

C1 VARCHAR2 (34 Byte)
C2 VARCHAR2 (1 Byte)
C3 VARCHAR2 (34 Byte)
CREATE_BY VARCHAR2 (50 Byte)
DT_CREATE DATE   SELECT C3,c1,c2 FROM TAB1 ORDER BY c1,c2;

--
http://www.freelists.org/webpage/oracle-l
Received on Sun Nov 06 2022 - 14:43:36 CET

Original text of this message