Re: Optimizing SQL with Full Table Scan

From: Reinhard Oleyniczak <reinhard.oleyniczak_at_isr.de>
Date: 23 Nov 2001 02:57:15 -0800
Message-ID: <fb6947ef.0111230257.56610e8f_at_posting.google.com>

Hi Tom,
your query is slow because it contains the worst form of a subquery, a so called 'correlated subquery' at the bottom of the statement. It is a 'correlated subquery' because at least one value from the outer query appears in the subquery, in this case 'PEV_A.PEV_PMI_ID' and 'PEV_A.PEV_DATE_ENTER'. This forces the database engine to compute the entire subquery for each and every row of the outer query.

Usually correlated subqueries can be avoided, either by transforming them into non-correlated subqueries or, even better, by transforming the statement so that you don't need a subquery at all. If Aggregate Functions like MAX() are involved, you will need a 'GROUP BY' and 'HAVING'-clause for that (see below).

But in your special case there is a far better opportunity: You can skip the entire subquery, its completely useless (except for wasting time). It says: 'From all things that are smaller than x, give me the biggest one and then lets see if it is smaller than or equal to x'. That's always true!

By the way: there is another useless line, one condition is noted twice:
> AND ( PEV_A.PEV_DATE_ENTER >= PEV_B.PEV_DATE_EXIT )
> AND ( PEV_A.PEV_DATE_ENTER - PEV_B.PEV_DATE_EXIT = 0 )
The first one (>=) does no harm, but is useless too and you should remove it.

Here is an example of how the transformed statement without a subquery would look like:

UPDATE 	PEV_INDIV_VISIT 				
 SET 	PEV_VISIT_30_YN = 'Y', PEV_VISIT_DAYS = 0				
 WHERE 	PEV_ID IN (				
 	SELECT 			
 		PEV_A.PEV_ID		
 	FROM			
 		PEV_INDIV_VISIT PEV_A ,		
 		PEV_INDIV_VISIT PEV_B ,		
* 		PEV_INDIV_VISIT PEV_C ,
 		PCL_INDIV_CLASS PCL_INDIV_CLASS_A ,		
 		PCL_INDIV_CLASS PCL_INDIV_CLASS_B		
 	WHERE			
 		( PEV_A.PEV_PMI_ID = PEV_B.PEV_PMI_ID )		
      	AND	( PEV_A.PEV_DATE_ENTER - PEV_B.PEV_DATE_EXIT = 0 )		
      	AND	( PEV_A.PEV_PCL_CODE = PCL_INDIV_CLASS_A.PCL_CODE )		
      	AND	( PEV_B.PEV_PCL_CODE = PCL_INDIV_CLASS_B.PCL_CODE )		
      	AND	( PCL_INDIV_CLASS_A.PCL_EPISODE_TYPE = 'INP' )		
      	AND	( PCL_INDIV_CLASS_B.PCL_EPISODE_TYPE = 'INP')     		
*      	AND	( PEV_C.PEV_PMI_ID = PEV_A.PEV_PMI_ID )
*       AND     ( PEV_A.PEV_DATE_ENTER > PEV_C.PEV_DATE_EXIT )
*  	GROUP BY PEV_A.PEV_ID -- group by all columns from the select-list
*  	HAVING   PEV_A.PEV_DATE_ENTER >= MAX (PEV_C.PEV_DATE_EXIT);

The 'WHERE'-clause applies filters before grouping, the 'HAVING'-clause applies filters after grouping, so you can use Aggregate Functions in a 'HAVING'-clause only. This is just for an transformation example. The lines marked with an '*' are still useless.

Best regards
Olli

tsu_at_landacorp.com (Tom Urbanowicz) wrote in message news:<6d63e629.0111201518.4b6f4028_at_posting.google.com>...

> The following UPDATE statement affects one of our largest tables with
> a few million records. As it stands, this SQL can take 8 hours to
> execute. Can anyone recommend how to optimize this? Is there a better
> approach with PL/SQL?. . .
> 
> Much thanks-
> Tom
> 
> The statement is:
> --------------------
> UPDATE 	PEV_INDIV_VISIT 				
> SET 	PEV_VISIT_30_YN = 'Y', PEV_VISIT_DAYS = 0				
> WHERE 	PEV_ID IN (				
> 	SELECT 			
> 		DISTINCT PEV_A.PEV_ID		
> 	FROM			
> 		PEV_INDIV_VISIT PEV_A ,		
> 		PEV_INDIV_VISIT PEV_B ,		
> 		PCL_INDIV_CLASS PCL_INDIV_CLASS_A ,		
> 		PCL_INDIV_CLASS PCL_INDIV_CLASS_B		
> 	WHERE			
> 		( PEV_A.PEV_PMI_ID = PEV_B.PEV_PMI_ID )		


>      	AND	( PEV_A.PEV_DATE_ENTER >= PEV_B.PEV_DATE_EXIT )		


>      	AND	( PEV_A.PEV_DATE_ENTER - PEV_B.PEV_DATE_EXIT = 0 )		

>      	AND	( PEV_A.PEV_PCL_CODE = PCL_INDIV_CLASS_A.PCL_CODE )		

>      	AND	( PEV_B.PEV_PCL_CODE = PCL_INDIV_CLASS_B.PCL_CODE )		

>      	AND	( PCL_INDIV_CLASS_A.PCL_EPISODE_TYPE = 'INP' )		

>      	AND	( PCL_INDIV_CLASS_B.PCL_EPISODE_TYPE = 'INP')     		

>      	AND	( PEV_A.PEV_DATE_ENTER >= (SELECT MAX (PEV_C.PEV_DATE_EXIT)

> 					   FROM   PEV_INDIV_VISIT PEV_C
> 					   WHERE  PEV_C.PEV_PMI_ID = PEV_A.PEV_PMI_ID
> 	     				   AND    PEV_C.PEV_DATE_EXIT < PEV_A.PEV_DATE_ENTER 
> 					   ) 
> 		)
> 	)
> ;				
> 
> Explain Plan is:
> --------------------
> 1  UPDATE STATEMENT    optimizer=RULE
> 2    UPDATE    PEV_INDIV_VISIT
> 3      NESTED LOOPS    
> 4        VIEW    VW_NSO_1
> 5          SORT  UNIQUE  
> 6            FILTER    
> 7              NESTED LOOPS    
> 8                NESTED LOOPS    
> 9                  NESTED LOOPS    
> 10                    TABLE ACCESS  FULL  PEV_INDIV_VISIT
> 10                    TABLE ACCESS  BY INDEX ROWID  PCL_INDIV_CLASS
> 11                      INDEX  UNIQUE SCAN  PCL_PRIMARY_KEY
> 9                  TABLE ACCESS  BY INDEX ROWID  PEV_INDIV_VISIT
> 10                    INDEX  RANGE SCAN  PEV_INDEX_01
> 8                TABLE ACCESS  BY INDEX ROWID  PCL_INDIV_CLASS
> 9                  INDEX  UNIQUE SCAN  PCL_PRIMARY_KEY
> 7              SORT  AGGREGATE  
> 8                TABLE ACCESS  BY INDEX ROWID  PEV_INDIV_VISIT
> 9                  INDEX  RANGE SCAN  PEV_INDEX_01
> 4        INDEX  UNIQUE SCAN  PEV_PRIMARY_KEY

Received on Fri Nov 23 2001 - 11:57:15 CET

This message: [ Message body ]
Next message: dias: "Re: first n rows..."
Previous message: Helgi Þór Ágústsson: "Re: Popular affiliates query problem"
In reply to: Tom Urbanowicz: "Optimizing SQL with Full Table Scan"
In reply to Tom Urbanowicz: "Optimizing SQL with Full Table Scan"
Next in thread: dias: "Re: first n rows..."

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

Original text of this message