version 7.1.3 parallel loading observations.

From: Dave Mausner <dmausner_at_brauntech.com>
Date: Wed, 26 Oct 1994 01:17:51 GMT
Message-ID: <1994Oct26.011751.8761_at_nntpxfer.psi.com>


Here are some observations about parallel loading, the new features of 7.1.3:

when sqlloading tables using direct=yes parallel=yes options, the loader ignores the extents already assigned to the table. it creates temporary segments in the same tablespace as the table to be loaded. it allocates these segments according to the storage parameters of the table, using the initial extent size first, then the next extent size for additional extents. after the temporaries are loaded, the unused space at the end of the last temporary extent is left in the free chain, then the extent is logically attached to the table.

when parallel loads occur, each load session behaves as outlined above; thus each session creates a new temporary segment which is filled and then attached to the table.

if any load session dies, its temporary segment may not be released in a reasonable amount of time. therefore, re-running the job will also fail because the tablespace has reduced capacity. you must stop and restart the rdbms to clear out the temp segment(s).

because the extents already in the table are ignored, you will waste much tablespace if initial = next = <large value>. oracle support says, create the table with initial = <small value, like 2 blocks' worth>. unfortunately, because the temporaries are also allocated the same way, you will wind up with a loaded table in many extents, contrary to what you have always been taught as good practice.

perhaps it needs to be emphasized that with parallel/direct loading you cannot create and load a table in just one integral extent.

as to the trimming of the temporary segments: this just means that the unused blocks after the loaded region are available for insertion. but they will not be accessible for parallel/direct loading. in fact, if your application depends on this new loading feature, you will inevitably waste a high percentage of your tablespace.

the problem is that you may not be able to measure exactly how many blocks of table a particular flat data file will become, after loading. if you could, then you would create the table with storage parameters that exactly add up to the desired size; no wasted tablespace. but you cannot do this most of the time because the flat files generally differ somewhat each time you load them, as in an app which takes data periodically from another source.

it appears that you will certainly save a lot of time by using parallel/direct loading; what you save is roughly proportional to how many cpu's you can activate with loaders. you can save more if you can precisely stripe your tablespace files so that each cpu drives a different controller. these days almost every system uses RAID 5 or at least massively striped file systems, so the benefits of log, table, rollback, index separation are less clear.

the disadvantages, namely extent chaining and loss of part of tablespace area, may be acceptable in some situations. the chaining won't be much of a problem when the app performs huge full scans across large extents anyway (the overhead is just a very small fraction of what's going on); but if your app updates the loaded data you may become alarmed by all the extra disk head movement.

many of these observations are not well-explained in the 7.1 utilities guide or server addendum.

--
Dave Mausner, Senior Consultant, Braun Technology Group, Chicago.
Received on Wed Oct 26 1994 - 02:17:51 CET

Original text of this message