Re: Slow create table statement

From: Ram Raman <veeeraman_at_gmail.com>
Date: Mon, 13 Jul 2020 21:00:43 -0500
Message-ID: <CAHSa0M0mYaNmwqw4-t__Kr8RCzLAgDef5OEP7f9t0StreHgY7w_at_mail.gmail.com>



Thanks everyone who responded. Jonathan, if I remove the use_hash hint, it falls back to doing nested loops on the big table slowing down things. I added the hint back and the performance was in15/20 min range.

Thanks Lothar and Sayan as well. I will check them out.

On Sat, Jul 11, 2020 at 3:47 AM Sayan Malakshinov <xt.and.r_at_gmail.com> wrote:

> It's hard to advice without knowing real physical distributions, but with
> old disks I'd check also one more approach:
>
> create table REQ_27333_CLN_EVNT NOLOGGING as
> with inc as (
> select/*+ no_merge */ *
> from (
> select/*+ parallel(ce, 8) no_eliminate_oby */
> rowid as rid
> from FIX.S_CLN_EVNT ce
> where exists(select 1 from DO2.REQ_27333_DX_ENCNTR i where ce.PERSON_SK
> = i.PERSON_SK)
> order by rid
> )
> union all select null from dual where 1=0 -- fix for parallel, may be
> not requires now
> )
> select /*+ PARALLEL(i, 8) */ ce.*
> from FIX.S_CLN_EVNT ce, inc i
> where ce.rowid = i.rid
> /
> If all required 14 mln rows (from 1.7bln) are located in a small part of
> all table blocks, there is a good chance that sorted access by rowid
> would be faster.
>
> On Sat, Jul 11, 2020 at 11:09 AM Lothar Flatz <l.flatz_at_bluewin.ch> wrote:
>
>> On the previous estimate (NL one) I would check the high and low values
>> for the join key PERSON_SK.
>> Is this a growing value? What is the meaning behind table
>> REQ_27333_DX_ENCNTR?
>> Wonder if the time difference for the stats make a difference on the join
>> estimate.
>>
>> Regards
>>
>> Lothar
>>
>> Am 11.07.2020 um 09:24 schrieb Jonathan Lewis:
>>
>>
>> Ram,
>>
>> I see the plan did a broadcast - which was necessary for the best
>> performance - but I was a little surprised to see that it didn't do a
>> partition iterator for the parallelism on the big table, given that the
>> degree was 8 and the number of partitions was 32 (viz: a multiple of 8).
>> However, that probably wouldn't have made very much difference. The
>> minimal improvement at the higher degree was probably because DOP 32
>> probably pushed you into queueing at the disc, it's counter-intuitive but
>> you might find 16 (e.g.) is better than 32.
>>
>> It looks as if someone, or something, may have gathered some stats since
>> the nested loop run. The cardinality estimate for the hash join is now 15M
>> when the estimate for the nested loop was 2.8M: if the stats hadn't changed
>> the cardinality estimate wouldn't change; and a factor of 5 on the NL cost
>> might have been enough to push it into a hash join automatically.
>>
>> Regards
>> Jonathan Lewis
>>
>>
>>
>> On Sat, Jul 11, 2020 at 2:14 AM Ram Raman <veeeraman_at_gmail.com> wrote:
>>
>>> Jonathan, I hinted to use HASH join, and as you said it was lot faster -
>>> returns in15 mins overall. Tried increasing the parallelism upto 32, but I
>>> only gain 2 mins with higher degrees.
>>> Here is the same document with the new statement towards the end:
>>> https://drive.google.com/file/d/1hIUHbVfBxsvJz5zC3g5ZZGRg99MPz6p7/view?usp=sharing
>>>
>>>
>>> Now I need to find out why it was not using the HASH in the first place
>>> without hint and also see if I can make it even faster.
>>>
>>> On Fri, Jul 10, 2020 at 4:57 PM Ram Raman <veeeraman_at_gmail.com> wrote:
>>>
>>>> Thanks Jonathan. There is no compression, but many columns dont have
>>>> values in them. I am bit puzzled by the naming convention, but that is a
>>>> different topic
>>>>
>>>> What you said makes good sense. It doesnt make sense to access them by
>>>> nested loops. Here are the big table stats:
>>>>
>>>>
>>>> NUM_ROWS BLOCKS LAST_ANAL AVG_ROW_LEN CHAIN_CNT
>>>> ------------------------ ------------ --------- ----------- ----------
>>>> 1,738,075,641 40,228,648 24-JUN-20 371 0
>>>>
>>>> Only one block size for the whole DB:
>>>>
>>>> SQL> sho parameter db_block_size
>>>> NAME TYPE
>>>> VALUE
>>>> ------------------------------------ --------------------------------
>>>> ------------------------------
>>>> db_block_size integer
>>>> 8192
>>>>
>>>> Partitions stats for the big table:
>>>>
>>>> PARTITION SUBPARTIT PARTITION_COUNT STATUS
>>>> --------- --------- --------------- --------
>>>> HASH NONE 32 VALID
>>>>
>>>> I did some calculation myself. If the query has to retrieve 14M rows
>>>> from the big table, given the average row length is 371 and hence rows per
>>>> block is about 22, it has to visit 634K blocks. Does not make sense to do
>>>> nested loop join for that. Let me try a HASH join hint
>>>>
>>>> Thanks
>>>> Ram.
>>>>
>>>>
>>>>
>>
>
> --
> Best regards,
> Sayan Malakshinov
> Oracle performance tuning engineer
> Oracle ACE Associate
> http://orasql.org
>

--

--
http://www.freelists.org/webpage/oracle-l
Received on Tue Jul 14 2020 - 04:00:43 CEST

Original text of this message