Re: best way to invoke parallel in DW loads?
Date: Mon, 14 Sep 2009 22:50:28 +0200
> My question lies with the best practices for invoking parallel.
My experience (not necessary the best practice) is to test on a given HW and data some basic use cases in the first step. This can be a load (sql loader or insert append), join a big fact table with dimension tables, merge, updates etc dependent on your application. In each test case you check the serial execution and explicitly hinted parallel executions; comparing the response time and resource consumption. This should give the first impression of the potential of PX.
To set the parallel_max_servers not only the number of CPU is relevant, but also the throughput of the disk I/O system. It can be set to a multiple of the CPU count; the exact coefficient is determined from the share of the slaves working in memory (and consuming more CPU, such as hash joins) and the slaves reading data from disk.
parallel_adaptive_multi_user could be useful to limit the overloading of the system. There is one caveat there; a user starting an ad hoc report in parallel at the time of the start of the load process can downgrade the load process to a serial execution and cause the degradation the load time.
> setting the degree of each and every segment
The particular problem of this approach is that the degree is globally valid. No difference is done between the creation of the object and the usage. A care should be taken if a table is maintained serially (e.g. with nested loop update) but it should be queried with parallel hash joins. Similar problem may cause high degrees in indexes "forgotten" there after a rebuild in parallel.
My approach is to determine for the performance critical parts of the application the parallel degree explicit using hints. The degree is set based on the results of the step one.
HTH Jaromir D.B. NemecReceived on Mon Sep 14 2009 - 15:50:28 CDT