Re: Optimizing SQL with Full Table Scan

From: Reinhard Oleyniczak <reinhard.oleyniczak_at_isr.de>
Date: 23 Nov 2001 02:57:15 -0800
Message-ID: <fb6947ef.0111230257.56610e8f_at_posting.google.com>


Hi Tom,
your query is slow because it contains the worst form of a subquery, a so called 'correlated subquery' at the bottom of the statement. It is a 'correlated subquery' because at least one value from the outer query appears in the subquery, in this case 'PEV_A.PEV_PMI_ID' and 'PEV_A.PEV_DATE_ENTER'. This forces the database engine to compute the entire subquery for each and every row of the outer query.

Usually correlated subqueries can be avoided, either by transforming them into non-correlated subqueries or, even better, by transforming the statement so that you don't need a subquery at all. If Aggregate Functions like MAX() are involved, you will need a 'GROUP BY' and 'HAVING'-clause for that (see below).

But in your special case there is a far better opportunity: You can skip the entire subquery, its completely useless (except for wasting time). It says: 'From all things that are smaller than x, give me the biggest one and then lets see if it is smaller than or equal to x'. That's always true!

By the way: there is another useless line, one condition is noted twice:
> AND ( PEV_A.PEV_DATE_ENTER >= PEV_B.PEV_DATE_EXIT )
> AND ( PEV_A.PEV_DATE_ENTER - PEV_B.PEV_DATE_EXIT = 0 )
The first one (>=) does no harm, but is useless too and you should remove it.

Here is an example of how the transformed statement without a subquery would look like:

UPDATE 	PEV_INDIV_VISIT 				
 SET 	PEV_VISIT_30_YN = 'Y', PEV_VISIT_DAYS = 0				
 WHERE 	PEV_ID IN (				
 	SELECT 			
 		PEV_A.PEV_ID		
 	FROM			
 		PEV_INDIV_VISIT PEV_A ,		
 		PEV_INDIV_VISIT PEV_B ,		
* 		PEV_INDIV_VISIT PEV_C ,
 		PCL_INDIV_CLASS PCL_INDIV_CLASS_A ,		
 		PCL_INDIV_CLASS PCL_INDIV_CLASS_B		
 	WHERE			
 		( PEV_A.PEV_PMI_ID = PEV_B.PEV_PMI_ID )		
      	AND	( PEV_A.PEV_DATE_ENTER - PEV_B.PEV_DATE_EXIT = 0 )		
      	AND	( PEV_A.PEV_PCL_CODE = PCL_INDIV_CLASS_A.PCL_CODE )		
      	AND	( PEV_B.PEV_PCL_CODE = PCL_INDIV_CLASS_B.PCL_CODE )		
      	AND	( PCL_INDIV_CLASS_A.PCL_EPISODE_TYPE = 'INP' )		
      	AND	( PCL_INDIV_CLASS_B.PCL_EPISODE_TYPE = 'INP')     		
*      	AND	( PEV_C.PEV_PMI_ID = PEV_A.PEV_PMI_ID )
*       AND     ( PEV_A.PEV_DATE_ENTER > PEV_C.PEV_DATE_EXIT )
*  	GROUP BY PEV_A.PEV_ID -- group by all columns from the select-list
*  	HAVING   PEV_A.PEV_DATE_ENTER >= MAX (PEV_C.PEV_DATE_EXIT);

The 'WHERE'-clause applies filters before grouping, the 'HAVING'-clause applies filters after grouping, so you can use Aggregate Functions in a 'HAVING'-clause only. This is just for an transformation example. The lines marked with an '*' are still useless.

Best regards
Olli

tsu_at_landacorp.com (Tom Urbanowicz) wrote in message news:<6d63e629.0111201518.4b6f4028_at_posting.google.com>...

> The following UPDATE statement affects one of our largest tables with
> a few million records. As it stands, this SQL can take 8 hours to
> execute. Can anyone recommend how to optimize this? Is there a better
> approach with PL/SQL?. . .
> 
> Much thanks-
> Tom
> 
> The statement is:
> --------------------
> UPDATE 	PEV_INDIV_VISIT 				
> SET 	PEV_VISIT_30_YN = 'Y', PEV_VISIT_DAYS = 0				
> WHERE 	PEV_ID IN (				
> 	SELECT 			
> 		DISTINCT PEV_A.PEV_ID		
> 	FROM			
> 		PEV_INDIV_VISIT PEV_A ,		
> 		PEV_INDIV_VISIT PEV_B ,		
> 		PCL_INDIV_CLASS PCL_INDIV_CLASS_A ,		
> 		PCL_INDIV_CLASS PCL_INDIV_CLASS_B		
> 	WHERE			
> 		( PEV_A.PEV_PMI_ID = PEV_B.PEV_PMI_ID )		

> AND ( PEV_A.PEV_DATE_ENTER >= PEV_B.PEV_DATE_EXIT )
> AND ( PEV_A.PEV_DATE_ENTER - PEV_B.PEV_DATE_EXIT = 0 )
> AND ( PEV_A.PEV_PCL_CODE = PCL_INDIV_CLASS_A.PCL_CODE )
> AND ( PEV_B.PEV_PCL_CODE = PCL_INDIV_CLASS_B.PCL_CODE )
> AND ( PCL_INDIV_CLASS_A.PCL_EPISODE_TYPE = 'INP' )
> AND ( PCL_INDIV_CLASS_B.PCL_EPISODE_TYPE = 'INP')
> AND ( PEV_A.PEV_DATE_ENTER >= (SELECT MAX (PEV_C.PEV_DATE_EXIT)
> FROM PEV_INDIV_VISIT PEV_C > WHERE PEV_C.PEV_PMI_ID = PEV_A.PEV_PMI_ID > AND PEV_C.PEV_DATE_EXIT < PEV_A.PEV_DATE_ENTER > ) > ) > ) > ; > > Explain Plan is: > -------------------- > 1 UPDATE STATEMENT optimizer=RULE > 2 UPDATE PEV_INDIV_VISIT > 3 NESTED LOOPS > 4 VIEW VW_NSO_1 > 5 SORT UNIQUE > 6 FILTER > 7 NESTED LOOPS > 8 NESTED LOOPS > 9 NESTED LOOPS > 10 TABLE ACCESS FULL PEV_INDIV_VISIT > 10 TABLE ACCESS BY INDEX ROWID PCL_INDIV_CLASS > 11 INDEX UNIQUE SCAN PCL_PRIMARY_KEY > 9 TABLE ACCESS BY INDEX ROWID PEV_INDIV_VISIT > 10 INDEX RANGE SCAN PEV_INDEX_01 > 8 TABLE ACCESS BY INDEX ROWID PCL_INDIV_CLASS > 9 INDEX UNIQUE SCAN PCL_PRIMARY_KEY > 7 SORT AGGREGATE > 8 TABLE ACCESS BY INDEX ROWID PEV_INDIV_VISIT > 9 INDEX RANGE SCAN PEV_INDEX_01 > 4 INDEX UNIQUE SCAN PEV_PRIMARY_KEY
Received on Fri Nov 23 2001 - 11:57:15 CET

Original text of this message