Re: Slow create table statement

From: Ram Raman <veeeraman_at_gmail.com>
Date: Tue, 14 Jul 2020 12:33:29 -0500
Message-ID: <CAHSa0M3CXbetMW62w1Gw+xxh7Hr7KRYL8kOG9a_DBwZBPpVA-A_at_mail.gmail.com>



Just a question. Would spreading the 2B rows across 64 partitions make a difference than keeping in 32 partitions?

On Mon, Jul 13, 2020 at 9:00 PM Ram Raman <veeeraman_at_gmail.com> wrote:

> Thanks everyone who responded. Jonathan, if I remove the use_hash hint,
> it falls back to doing nested loops on the big table slowing down things. I
> added the hint back and the performance was in15/20 min range.
>
> Thanks Lothar and Sayan as well. I will check them out.
>
> On Sat, Jul 11, 2020 at 3:47 AM Sayan Malakshinov <xt.and.r_at_gmail.com>
> wrote:
>
>> It's hard to advice without knowing real physical distributions, but with
>> old disks I'd check also one more approach:
>>
>> create table REQ_27333_CLN_EVNT NOLOGGING as
>> with inc as (
>> select/*+ no_merge */ *
>> from (
>> select/*+ parallel(ce, 8) no_eliminate_oby */
>> rowid as rid
>> from FIX.S_CLN_EVNT ce
>> where exists(select 1 from DO2.REQ_27333_DX_ENCNTR i where
>> ce.PERSON_SK = i.PERSON_SK)
>> order by rid
>> )
>> union all select null from dual where 1=0 -- fix for parallel, may be
>> not requires now
>> )
>> select /*+ PARALLEL(i, 8) */ ce.*
>> from FIX.S_CLN_EVNT ce, inc i
>> where ce.rowid = i.rid
>> /
>> If all required 14 mln rows (from 1.7bln) are located in a small part of
>> all table blocks, there is a good chance that sorted access by rowid
>> would be faster.
>>
>> On Sat, Jul 11, 2020 at 11:09 AM Lothar Flatz <l.flatz_at_bluewin.ch> wrote:
>>
>>> On the previous estimate (NL one) I would check the high and low values
>>> for the join key PERSON_SK.
>>> Is this a growing value? What is the meaning behind table
>>> REQ_27333_DX_ENCNTR?
>>> Wonder if the time difference for the stats make a difference on the
>>> join estimate.
>>>
>>> Regards
>>>
>>> Lothar
>>>
>>> Am 11.07.2020 um 09:24 schrieb Jonathan Lewis:
>>>
>>>
>>> Ram,
>>>
>>> I see the plan did a broadcast - which was necessary for the best
>>> performance - but I was a little surprised to see that it didn't do a
>>> partition iterator for the parallelism on the big table, given that the
>>> degree was 8 and the number of partitions was 32 (viz: a multiple of 8).
>>> However, that probably wouldn't have made very much difference. The
>>> minimal improvement at the higher degree was probably because DOP 32
>>> probably pushed you into queueing at the disc, it's counter-intuitive but
>>> you might find 16 (e.g.) is better than 32.
>>>
>>> It looks as if someone, or something, may have gathered some stats since
>>> the nested loop run. The cardinality estimate for the hash join is now 15M
>>> when the estimate for the nested loop was 2.8M: if the stats hadn't changed
>>> the cardinality estimate wouldn't change; and a factor of 5 on the NL cost
>>> might have been enough to push it into a hash join automatically.
>>>
>>> Regards
>>> Jonathan Lewis
>>>
>>>
>>>
>>> On Sat, Jul 11, 2020 at 2:14 AM Ram Raman <veeeraman_at_gmail.com> wrote:
>>>
>>>> Jonathan, I hinted to use HASH join, and as you said it was lot faster
>>>> - returns in15 mins overall. Tried increasing the parallelism upto 32, but
>>>> I only gain 2 mins with higher degrees.
>>>> Here is the same document with the new statement towards the end:
>>>> https://drive.google.com/file/d/1hIUHbVfBxsvJz5zC3g5ZZGRg99MPz6p7/view?usp=sharing
>>>>
>>>>
>>>> Now I need to find out why it was not using the HASH in the first place
>>>> without hint and also see if I can make it even faster.
>>>>
>>>> On Fri, Jul 10, 2020 at 4:57 PM Ram Raman <veeeraman_at_gmail.com> wrote:
>>>>
>>>>> Thanks Jonathan. There is no compression, but many columns dont have
>>>>> values in them. I am bit puzzled by the naming convention, but that is a
>>>>> different topic
>>>>>
>>>>> What you said makes good sense. It doesnt make sense to access them by
>>>>> nested loops. Here are the big table stats:
>>>>>
>>>>>
>>>>> NUM_ROWS BLOCKS LAST_ANAL AVG_ROW_LEN CHAIN_CNT
>>>>> ------------------------ ------------ --------- ----------- ----------
>>>>> 1,738,075,641 40,228,648 24-JUN-20 371 0
>>>>>
>>>>> Only one block size for the whole DB:
>>>>>
>>>>> SQL> sho parameter db_block_size
>>>>> NAME TYPE
>>>>> VALUE
>>>>> ------------------------------------ --------------------------------
>>>>> ------------------------------
>>>>> db_block_size integer
>>>>> 8192
>>>>>
>>>>> Partitions stats for the big table:
>>>>>
>>>>> PARTITION SUBPARTIT PARTITION_COUNT STATUS
>>>>> --------- --------- --------------- --------
>>>>> HASH NONE 32 VALID
>>>>>
>>>>> I did some calculation myself. If the query has to retrieve 14M rows
>>>>> from the big table, given the average row length is 371 and hence rows per
>>>>> block is about 22, it has to visit 634K blocks. Does not make sense to do
>>>>> nested loop join for that. Let me try a HASH join hint
>>>>>
>>>>> Thanks
>>>>> Ram.
>>>>>
>>>>>
>>>>>
>>>
>>
>> --
>> Best regards,
>> Sayan Malakshinov
>> Oracle performance tuning engineer
>> Oracle ACE Associate
>> http://orasql.org
>>
>
>
> --
>
>
>

--

--
http://www.freelists.org/webpage/oracle-l
Received on Tue Jul 14 2020 - 19:33:29 CEST

Original text of this message