Re: How to minimise sorting time in a query

From: Andy Sayer <andysayer_at_gmail.com>
Date: Sun, 6 Nov 2022 10:55:14 -0800
Message-ID: <CACj1VR7EpVpUFKQynYLjGkMAEoikjETRYLdvr85oXs_PkmUVqw_at_mail.gmail.com>



Can we take a step back? Grabbing a billion rows up to 10 times a day that must be sorted seems like something I’d want to avoid at all costs. What’s the point?

Without knowing details, you may need something to capture changes to apply to your billion row cache. Something like Goldengate could be a good fit, but we’d need to know more about what the real requirements are.

Thanks,
Andy

On Sun, Nov 6, 2022 at 10:47 AM, Lok P <loknath.73_at_gmail.com> wrote:

> Yes and also the column C1 is having data which has hundreds of millions
> of distinct values in it. Its value also looks something like
> "4007896240517865" and also for many it holds all characters. So I really
> didn't see an easy way to define the range partition which will divide the
> data equally across all partitions. The max_value and min_value defined in
> dba_tab_columns are as below.
>
> 32333038363430303030303032353636 -- 36373931393634343030343432383039
>
>
>
> On Sun, Nov 6, 2022 at 9:25 PM Mark W. Farnham <mwf_at_rsiz.com> wrote:
>
>> Unless there is some relationship between created date and c1 such that
>> c1 only rises over creation time, in which case you could use a local index
>> on c1,c2,c3 and do the creation date partitions in order in the ETL…
>>
>>
>>
>> *From:* oracle-l-bounce_at_freelists.org [mailto:
>> oracle-l-bounce_at_freelists.org] *On Behalf Of *Mark W. Farnham
>> *Sent:* Sunday, November 06, 2022 8:44 AM
>> *To:* mwf_at_rsiz.com; jlewisoracle_at_gmail.com; loknath.73_at_gmail.com
>> *Cc:* 'Oracle L'
>> *Subject:* RE: How to minimise sorting time in a query
>>
>>
>>
>> I was typing this before your last response with the partitioning on
>> create date information arrived on the list.
>>
>>
>>
>> This obviously won’t work with your partitioning.
>>
>>
>>
>> mwf
>>
>>
>>
>> *From:* oracle-l-bounce_at_freelists.org [
>> mailto:oracle-l-bounce_at_freelists.org <oracle-l-bounce_at_freelists.org>] *On
>> Behalf Of *Mark W. Farnham
>> *Sent:* Sunday, November 06, 2022 8:34 AM
>> *To:* jlewisoracle_at_gmail.com; loknath.73_at_gmail.com
>> *Cc:* 'Oracle L'
>> *Subject:* RE: How to minimise sorting time in a query
>>
>>
>>
>> AND, tell us more about the ETL.
>>
>>
>>
>> IF the ETL can be arranged to reference certain partitions (ideally, for
>> example, one c1 at a time if c1 one is your range partition), AND you can
>> serialize the ETL executions so that you can reference the correct range
>> partition as TAB1 via synonym,
>>
>>
>>
>> SELECT C3,c1,c2 FROM TAB1 ORDER BY c1,c2
>>
>>
>>
>> Then tab1 could be a synonym altered to resolve to one of n views
>>
>>
>>
>> Select C3,’literally c1 for this particular range’ as c1, c2 from
>> <partition_name of this literal c1 for tab1> order by c2
>>
>>
>>
>> Then you could have a local index on c2, c3 on each partition c1 which
>> would not need to reference the table (or partition) at all, since it has
>> all the data it needs in sorted order from that index.
>>
>>
>>
>> This, of course, creates a concurrency gate, but they can be overlapped,
>> since once the sql is resolved and executing the synonym latch can be
>> broken without harm (thankyou, Oracle VLDB circa 1992 and Ken Jacobs, RIP,
>> Derry, and Andy for seeing the value of allowing synonyms to evaporate once
>> they have been parsed.)
>>
>>
>>
>> So you could
>>
>>
>>
>> alter synonym, start etl for c1 value 1 (not waiting for completion)
>>
>> alter synonym, start etl for c1 value 2 (not waiting for completion)…
>>
>>
>>
>> alter synonym, start etl for c1 value n
>>
>>
>>
>> But you cannot start another series of etl jobs until value n for c1 has
>> started.
>>
>>
>>
>> So that might not be a horrible concurrency gate if you can manage it.
>>
>>
>>
>> That’s a lot of dancing, but it does not require you change the internal
>> code for the ETL, just manage its submission, and it completely eliminates
>> the sort.
>>
>>
>>
>> This, of course, may or may not be manageable in your particular scenario.
>>
>>
>>
>> Please also notice I started with “AND” (not but).
>>
>>
>>
>> mwf
>>
>>
>>
>> *From:* oracle-l-bounce_at_freelists.org [
>> mailto:oracle-l-bounce_at_freelists.org <oracle-l-bounce_at_freelists.org>] *On
>> Behalf Of *Jonathan Lewis
>> *Sent:* Sunday, November 06, 2022 4:21 AM
>> *To:* loknath.73_at_gmail.com
>> *Cc:* Oracle L
>> *Subject:* Re: How to minimise sorting time in a query
>>
>>
>>
>>
>>
>> You table is range partitioned - what's the partitioning column?
>>
>> Do the table have a primary key?
>>
>>
>>
>> You want to sort by c1, c2 - if you're not range partitioned by c1 you
>> will HAVE to do a sort to get the data in order.
>>
>> As Mladen suggests, the best option for your requirement would be to turn
>> the table into an IOT, but that's only possible if the table has a primary
>> key, and then you need the primary key to start with (c1,c2) if you want to
>> avoid a sort, and you have to worry about the impact on data loading. If
>> you need other indexes on the table you then have to worry about the
>> effects on those indexes (for both loading and querying).
>>
>>
>>
>> Your second best option might be to create an index on c1, c2, c3,
>> globally partitioned by range on c1, then hint the query to do a full scan
>> on the index - and not allow parallelism. This requires a MAXVALUE
>> partition on C1, so you may need to put in place a mechanism to keep the
>> MAXVALUE partition empty and split it on a regular basis so that you don't
>> end up with one enormous partition some time in the future.
>>
>>
>>
>> Regards
>>
>> Jonathan Lewis
>>
>>
>>
>>
>>
>> On Sat, 5 Nov 2022 at 20:18, Lok P <loknath.73_at_gmail.com> wrote:
>>
>> Hello Listers, In a 19C Oracle database. One requirement is to just pick
>> all the records sorted by two columns from a lookup table. The table holds
>> ~1billion rows. And this data needs to be cached daily 8-10 times as it's
>> getting used by an ETL workflow for lookup and it's queried in sorted
>> fashion from the database by that ETL workflow. The issue is that the
>> majority of time is spent while sorting those ~1billion rows each time.
>> Below is the sample sqlmonitor of the sql query. We wanted to understand
>> what best we can do to minimize the response time of this query so that ETL
>> workflow time can be reduced?
>>
>> It's already using ~1GB of memory and ~30GB of temp. The rows are
>> expected to grow 2billion+. So setting manual memory management and
>> sort_area_size as 2GB may not help in the long run. Tried running in
>> parallel(4) and parallel(8) but then no relief in the amount of temp spill,
>> rather the response time seems to be increasing further. Is there any other
>> possible way to reduce the execution time of this query?
>>
>> The table has three columns with below data structure/data types.
>>
>> C1 VARCHAR2 (34 Byte)
>> C2 VARCHAR2 (1 Byte)
>> C3 VARCHAR2 (34 Byte)
>> CREATE_BY VARCHAR2 (50 Byte)
>> DT_CREATE DATE
>>
>>
>>
>>
>> SELECT C3,c1,c2 FROM TAB1 ORDER BY c1,c2;
>>
>>

--
http://www.freelists.org/webpage/oracle-l
Received on Sun Nov 06 2022 - 19:55:14 CET

Original text of this message