Re: parallel_force_local and parallel query

From: Cee Pee <carlospena999_at_gmail.com>
Date: Wed, 26 Feb 2020 10:14:47 -0600
Message-ID: <CAPTPB11RGzG06V9VDkaeA+uk2Y0HNsRtFfT0RdGj_Z36i47njw_at_mail.gmail.com>



Thanks Ghassan for the detailed explanation. I would think that if we partition the application such that all a table's operations are from one node we may not have to worry about in-memory having to reconstruct tables in a node if PFL set to true. If we use the in-memory option, I still would want to set parallel force local to true. With PFL false, I would not want a sudden high volume traffic across interconnect in case of any DML that query huge volumes of data over multiple nodes and potentially risk node evictions due to the traffic. I would rather lose the time in reconstructing the data in case of in-mem operations. Thank you Jonathan too for your explanation.

Is anyone aware of any white paper using RAC for DW? Thanks. Web searching got me this
<https://www.oracle.com/technetwork/database/clustering/overview/bp-rac-dw-130699.pdf>, this
<https://docs.oracle.com/database/121/RACAD/GUID-CD5179E8-5251-4E6D-B98D-9278C3BE6E0D.htm> and this
<https://www.rittmanmead.com/blog/2005/05/is-rac-suitable-for-data-warehousing/>. I read Tom Kyte's comment on the last one. Too bad the 'dizwell' link does not work.

On Wed, Feb 26, 2020 at 3:43 AM Ghassan Salem <salem.ghassan_at_gmail.com> wrote:

> CP,
> In Exadata, lots of things are offloaded to storage cells, from
> projections to predicates to joins (bitmaps for hash joins), to some
> aggregations (depending on the version), ..., but this does not change the
> how the parallel operations are done, and what I said about partition-wise
> joins remains true.
>
> As for in-memory, if you're using RAC, you must NOT use
> parallel_force_local=true. In RAC, you generally (unless you're on exadata
> and decided to duplicate the inmemory area) have say some partitions' data
> in the in-memory area of node 1, some other partitions in node 2, ... so if
> you force local parallel query, it will not be able to use the in-memory of
> the other nodes, and it will have to read from disk.
>
> Regards
>
> On Tue, Feb 25, 2020 at 9:02 PM Cee Pee <carlospena999_at_gmail.com> wrote:
>
>> Thanks Ghassan. I thought they moved the compute power closer to the data
>> in exadata; ie, lots of processing were happening in the storage itself and
>> the aggregates were sent to the DB server from storage for further
>> processing. If that is the case there may not be much transfers happening
>> via cache fusion in exadata - a guess on my part.
>>
>> Can you explain why I should not parallel_force_local to true when using
>> in memory option?
>>
>>
>> On Tue, Feb 25, 2020 at 3:23 AM Ghassan Salem <salem.ghassan_at_gmail.com>
>> wrote:
>>
>>> CP,
>>> There are a lot of oracle clients using RAC for DWH. Oracle has
>>> thousands of Exadata machines out there, they all run RAC and a big % run
>>> DWH databases.
>>> As Jonathan said, you should know how RAC works in order to profit from
>>> it.
>>> Back to parallel_force_local=True/False, it depends on what you have
>>> underneath, and the structure of your tables. For example, if you have
>>> partition-wise hash join, Oracle is usually smart enough to read and join
>>> the partitions on the same node. If you don't have partition-wise join, you
>>> may end up in the situation Jonathan explained: reading on one node, and
>>> sending part of the data to the other.
>>>
>>> So, depending on your hardware/schema-design, you may be better off with
>>> setting it to false or to true.
>>> One last note: if you use the in-memory option, you should not set it to
>>> true, and you should enable autoDOP.
>>>
>>> Regards
>>>
>>> On Tue, Feb 25, 2020 at 12:33 AM Cee Pee <carlospena999_at_gmail.com>
>>> wrote:
>>>
>>>> Jonathan, that is exactly my concern about using RAC for a
>>>> data warehouse. Do people who run DW on RAC (are there any?) use services
>>>> for *application* partitioning along with this parameter to control the
>>>> data transfer madness over interconnect? But with DML, unless good
>>>> application partitioning strategy is used, parallel_force_local may not be
>>>> able to do much, if one node has to request uncommitted data from a
>>>> different node.
>>>>
>>>> I would think in that case the application has to be configured such
>>>> that all the DDL/DML/selects for a table happen in one node only.
>>>>
>>>>
>>>> On Mon, Feb 24, 2020 at 5:16 PM Jonathan Lewis <
>>>> jonathan_at_jlcomp.demon.co.uk> wrote:
>>>>
>>>>> The point of parallel_force_local is that the query coordinator and
>>>>> all its PX slave processes operate on the same node so that any PQ messages
>>>>> areinter-process on the node and no PQ messages (for the query) have to
>>>>> travel across the interconnect.
>>>>>
>>>>> If you think about a parallel hash join using hash/hash distribution,
>>>>> for example; this will use two slave sets the first slave set scans the
>>>>> first table and distributes the interesting rows to the second slave set,
>>>>> then the first slave set scans the second table and distributes it's rows
>>>>> in the same way to the second slave set. If you don't set
>>>>> parallel_force_local to true then you could (in theory, though Oracle
>>>>> always makes some attempt to avoid it) end up with the first slave set on
>>>>> one node and the second slave set on another node, and all the data flying
>>>>> across the interconnect before any joining gets done - them maybe the
>>>>> entire result set could go flying back across the interconnect because the
>>>>> Query coordinator is on a third node. A single bad parallel query could
>>>>> flood the interconnect and basically kill the system - to the extent that
>>>>> nodes could get ejected because they couldn't get their heartbeat messages
>>>>> through fast enough.
>>>>>
>>>>> Regards
>>>>> Jonathan Lewis
>>>>>
>>>>>
>>>>> ________________________________________
>>>>> From: oracle-l-bounce_at_freelists.org <oracle-l-bounce_at_freelists.org>
>>>>> on behalf of Cee Pee <carlospena999_at_gmail.com>
>>>>> Sent: 24 February 2020 20:38
>>>>> To: Oracle-L Freelists
>>>>> Subject: parallel_force_local and parallel query
>>>>>
>>>>>
>>>>> Hopefully not a dumb question, By setting parallel force local to TRUE
>>>>> all the queries for a particular table and other objects happen in the
>>>>> local instance only. What happens if a different query is issued from a
>>>>> different node for the same table or object. Does the data get transferred
>>>>> over the interconnect, if data is still cached in the first node. Assume
>>>>> SELECTs only in both nodes.
>>>>>
>>>>> Thanks
>>>>> CP
>>>>> --
>>>>> http://www.freelists.org/webpage/oracle-l
>>>>>
>>>>>
>>>>>

--
http://www.freelists.org/webpage/oracle-l
Received on Wed Feb 26 2020 - 17:14:47 CET

Original text of this message