Huge tempspace requirement - doesn't match explain plan
Date: Thu, 21 Aug 2008 23:51:27 +0200
I'm having terrible trouble with a complex query that is consuming huge amounts of temp space. Sometimes.
This query is large with several nested queries and uses parallelism. One upshot of this is that the explain plans, already complex, are very challenging to read with the addition of the PX send and receive events - they're over a hundred lines long with deep nesting. In text form from awrsqrpt they are truncated halfway thru the tmpspc column which doesn't help since temp space is a specific concern.
The instance has 320 GB (!) of temp space assigned and this query is using it all, but the tmpspc report in the html form of the explain plan generated by awrsqrpt seems to show a requirement of less than 10 GB. I'm monitoring the tmp space usage with a little script which polls v$tempspace & v$sort_usage and I can see each of 64 parallel servers using 8GB each. the query runs for about 2 hours before failing with temp space exhaustion. Reducing parallelism doesn't seem to have any effect on the eventual temp requirement.
On a test instance on another but fairly similar box, the temp space usage is peaking at around 18 GB - this is with very similar driving table volumes, identical parallelism settings etc. Run completes in under 2 hours. Explain plans are different but it is difficult to track down how and why, given their sheer size and complexity. Also, different 'successful' runs with this smaller temp requirement on the test box are themselves using quite different explain plans.
This is 10.2.0.4, full table stats (i.e. estimate=>null, cascade=>true) have been gathered for all tables involved in the joins and BVP has been disabled with _OPTIM_PEEK_USER_BINDS=false to avoid histograms confusing the picture for the moment.
So, I suppose my questions are as follows for anyone willing to read this far... I know I haven't posted much to go on - I'm pretty sure I'm unable to post actual details for reasons of IPR etc.
- is there a workaround for awrsqrpt truncating horizontal output? I'm guessing it would just be to edit the script but haven't dared to yet... The HTML version is an acceptable workaround but I thought it might be nice to be able to use diff on text versions of plans. I'd also really like to be able to see the parent_id of a rowset - at depths like this, the indentation is close to useless for figuring out siblings and parents.
- Why would the explain plans estimate for temspace requirement be so wildly at odds with what is actually being consumed?
- *Very broadly speaking*, I'm assuming that this massive use of temp space is caused by too many hash joins of full table scans - is trying to force more nested loops a reasonable strategy to reduce temp requirement? I'd be inclined to use optimizer_index_cost_adj and _cache parameters for this since I want to avoid hints if possible. Is this a good way to encourage NLs?
- Are there any tools/techniques for organising/viewing/comparing large explain plans?
Cheers for any insights,