Re: Tina London's Article re: Code Development Rules

From: JEFF CHAMBLEE <chamblej_at_author.gsfc.nasa.gov>
Date: 11 Oct 1994 09:51 -0500
Message-ID: <11OCT199409512981_at_author.gsfc.nasa.gov>


In article <CxA2D8.n42_at_unocal.com>, dblpraf_at_rogue.unocal.com (Bob Fleisig) writes...

>If anyone still has a copy of Tina London's article regarding 
>good code development rules, would you please repost the article.
> 
>Thanks in advance.
> 
>|----------------------------------------- NO SPECIAL SIG LINE -----------------------------------------------|

Reprinted SANS permission.

Path: skates.gsfc.nasa.gov!kong.gsfc.nasa.gov!cs.umd.edu!haven.umd.edu!darwin.sura.net!sgiblab!munnari.oz.au!goanna!escargot!otto!dtb From: dtb_at_otto.bf.rmit.oz.au (David Bath) Newsgroups: comp.databases.oracle
Subject: Oracle SQL Guidlines / Optimizer notes Summary: Guidelines for efficient SQL with Oracle by Tina London Message-ID: <dtb.736743126_at_otto>
Date: 7 May 93 02:52:06 GMT
Reply-To: dtb_at_otto.bf.rmit.oz.au
Organization: Royal Melbourne Institute of Technology Lines: 3117
NNTP-Posting-Host: otto.bf.rmit.oz.au

I received the following from Tina London in the UK. I think its worth posting to the group. While I format things slightly differently, her reasons are valid and worth browsing. Her notes on the V6 optimiser are great. I will be incorporating an abridged version of some of this in the non-DBA FAQ 0.4. Her current address is tinalondon_at_artemis.demon.co.uk

Thanks Tina !!!!!!!!

 GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 1  AUTHOR : Tina London
 DATE : August 7, 1992

                  GUIDELINES AND GOOD PRACTICE GUIDE

                          FOR DEVELOPING SQL



                         Author : Tina London

                         Email  : tlondon_at_cix.compulink.co.uk

                         Date : August 7, 1992





















 GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 1  

 GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 2  AUTHOR : Tina London
 DATE : August 7, 1992

          1.0 Introduction

          2.0  SQL Layout
          2.1       General layout
          2.2       SQL Keywords
          2.3       Constants and variables.
          2.4       Outer Joins
          2.5       Table aliases.
          2.6       Ordering of where clauses.

          3.0  Database selects.
          3.1       Ordering of the from clause.
          3.2       Unintentionally disabling indexes.
          3.3       Intentionally disabling indexes.
          3.4       Investigate Rewriting the Query
          3.5       Use of the Exists operator
          3.6       Don't perform unnecessary joins.
          3.7       Resource intensive operations
          3.8       Use realistic test data.
          3.9       Use of != operator
          3.10      Use Oracle's trace facility.
          3.11      Management of oracle cursors.
          3.12      The 10,15,20 percent rule

          4.0  Insert statements.

          5.0  Database updates.

          6.0  Optimising oracle queries.
          6.1       Table access.
          6.2       Indexes and Null.
          6.3       Indexes and 'NOT=' predicates.
          6.4       Group by and predicate clauses.
          6.5       Multiple index queries.
          6.6       When indexes cannot be merged
          6.7       Suppression of indexes for performance
          6.8       Concatenated indexes
          6.9       Or optimisation
          6.10      Non correlated sub queries
          6.11      Correlated sub queries


          Appendix A     Query Paths Ranked in Order of Speed

          Appendix B     Tables used in the examples.

          Appendix C     Correlated updates and PL/SQL.

          Appendix D    Guidelines for outer join syntax          




 GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 2  

 GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 3  AUTHOR : Tina London
 DATE : August 7, 1992

          Appendix E The ten commandments for fast queries.

          Appendix F Oracle trace facility.

 GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 3  

 GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 4  AUTHOR : Tina London
 DATE : August 7, 1992

          1.0 Introduction

               The purpose of this guide is to present  standards 
          and guideline for the wms development team, which  will 
          ensure  that quality SQL is produced. The rules on  SQL 
          layout are important, because, they improve the clarity 
          of the statements.

               Developers  should  realise  that,  as  in   third 
          generation  programming, the first method of writing  a 
          program  is not necessarily the best, the same is  true 
          of fourth generation SQL queries.

               Adherence  to  the SQL layout rules  needs  to  be 
          enforced  now, as this will allow the project  to  take 
          advantage of Oracle V7 SQL caching facilities.  Version 
          7  of Oracle has an SQL cache, which will  hold  parsed 
          queries,  Oracle determines whether a query is  in  the 
          cache by doing a case insensitive comparison

               All  code reviews, which require database  access, 
          will  have available, as part of the document set,  sql 
          trace  output formatted with tkprof. This  output  will 
          be formally reviewed, by the database team, as part  of 
          the review process.


          2.0 SQL Layout

          2.1 General layout

               SQL  statements  should be laid out so as  to  aid 
          readability  and maintenance. Each statement should  be 
          in  a separate sql file. 

               1)     Statements  should  be  indented  so   that 
               individual lines are neatly aligned. 

               2)   Each expression in the SELECT list should  be 
               on a different line. 

               3)    Every table in the FROM list should be on  a 
               fresh line.

          2.2 SQL Keywords

               1)    Oracle keywords should be entered  in  upper 
               case

               2)  The following keywords should be placed  on  a 
               new line :



 GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 4  

 GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 5  AUTHOR : Tina London
 DATE : August 7, 1992

                    SELECT
                    INTO
                    FROM
                    WHERE
                    AND/OR
                    GROUP BY
                    HAVING
                    CONNECT BY
                    FOR UPDATE OF
                    ORDER BY

               3) All keywords within the same select should line 
               up  with  the  previous  keywords.  Nested  select 
               statements  should be indented and lined  up  with 
               the second word on the previous line, e.g. :

                    SELECT    sal,
                              Job,
                              ename,
                              dept
                    FROM      emp
                    WHERE     sal > any

(SELECT sal
FROM emp WHERE deptno = 30) ORDER BY sal; 4) Lines which don't start with a listed keyword should be lined up with the second word on the previous line, e.g. : SELECT ename, dept FROM emp 2.3 Constants and variables. 1) Constants, program variables etc. should be on the right hand side of a WHERE or HAVING clause. SELECT ename FROM emp WHERE empno = '1232' Or SELECT ename FROM emp WHERE empno = :1

 GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 5  

 GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 6  AUTHOR : Tina London
 DATE : August 7, 1992

          2.4 Outer Joins.

                    Columns  requiring to be outer joined  should 
               appear on the right hand side of a WHERE or HAVING 
               clause.

                    SELECT    ename
                    FROM      emp e,
                              dept d
                    WHERE     e.empno = d.empno(+)    

          2.5  Table aliases.

                    Table  aliases should be used in all  queries 
               that have more than one table in the FROM  clause. 
               The use of table aliases speeds up the parse phase 
               of  an  oracle query, by reducing  the  number  of 
               recursive sql queries.

                    SELECT    count(*) 
                    FROM      oe o,
                              oe_link l,
                              oe_link_name n
                    WHERE     o.oe = l.oe
                    AND       l.name = n.name

                    Notice  the  table aliases  o,l,n  and  their 
               subsequent use in the where clause.

          2.6 Ordering of where clauses.

                    As  an  aid to understanding,  WHERE  clauses 
               should  be laid out with the join  clauses  first, 
               and the restriction clauses second.


          3.0 Database selects.

          3.1 Ordering of the FROM clause.

                    This rule is for when the oracle optimiser is 
               stuck for a good idea. The Oracle optimiser  works 
               in  the following manner. It looks at each of  the 
               WHERE  clauses and assigns the tables concerned  a 
               number based on the type of predicate e.g. field = 
               'const'  or field(+) = field. It then chooses  the 
               table with the lowest score as the driving  table. 
               But, and its a big one, if more than one table has 
               the  same  lowest score then it chooses  the  last 
               table in the FROM list to be the driving table.




 GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 6  

 GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 7  AUTHOR : Tina London
 DATE : August 7, 1992

                    See appendix A for a listing of the predicate 
               scoring system.

                     In  this  example there are indexes  on  the 
               tables as follows

                    Indexes :

                    unique on oe(id)
                    unique on oe_link(oe)
                    unique on oe_link_name(name)
                    non unique on oe(oe)


                    SELECT    count(*) 
                    FROM      oe_link l,
                              oe_link_name n,
                              oe o
                    WHERE     o.oe = l.oe
                    AND       l.name = n.name

                    Time 621.21 secs

                    SELECT    count(*) 
                    FROM      oe o,
                              oe_link l,
                              oe_link_name n
                    WHERE     o.oe = l.oe
                    AND       l.name = n.name


                    Time 197.05 secs

                     Notice the difference in query timings.  The 
               only difference in the two queries is the order of 
               the tables in the FROM clause. Therefore the table 
               returning the least number of rows should be  last 
               in the FROM list.

          3.2  Unintentional disabling of indexes.


                    This  is  a very easy  way  to  inadvertently 
               disable the use of the indexes.

                     On  table  oe,  which is  described  in  the 
               appendix. Column id has a datatype of number.

                    SELECT    id,
                              oe
                    FROM      oe
                    WHERE     to_char(id) = 1232



 GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 7  

 GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 8  AUTHOR : Tina London
 DATE : August 7, 1992

                    Time 97 secs.

                    whereas 

                    SELECT    id,
                              oe
                    FROM      oe
                    WHERE     id = 1232

                    Time .4 secs.

                    The reason for this is that oracle cannot use 
               an  index if the column is modified in  the  where 
               clause. It is up to the application programmer  to 
               ensure that, WHERE clause columns, aren't modified 
               in  any way.  

                    Note that the following query will also  stop 
               the use of indexes.

                    SELECT    id,
                              oe
                    FROM      oe
                    WHERE     id+1 = 1233

                     Dates  in  the where clause can  also  cause 
               problems.  To select all the records entered on  a 
               certain  day  the following three  ideas  come  to 
               mind.

                    SELECT    *  
                    FROM      oe
                    WHERE     trunc(timestamp) = '26-MAR-91'

                    SELECT    *
                    FROM      oe
                    WHERE     timestamp between '26-mar-91' 
                    AND       '27-mar-91'

                    SELECT    * 
                    FROM      oe 
                    WHERE     timestamp >= to_date('26-mar-91:00:00',
                              'dd-mon-yy:hh24:mi')
                    AND       timestamp < to_date('27-mar-91:00:00',
                              'dd-mon-yy:hh24:mi')


                     The first one when run takes 240 seconds  to 
               produce a result. 

                     The  second one only takes 1.05  seconds  to 



 GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 8  

 GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 9  AUTHOR : Tina London
 DATE : August 7, 1992

               run,  however,  it  has  a  featurette.   It  will 
               include  any records for midnight on the  27th  of 
               march. 

                     The  last one only takes .5 of a second  and 
               it   doesn't  retrieve  the  extra   records   for 
               midnight.

                    Examples along the lines of that shown  below 
               are  also considered to be  column  modifications. 
               Therefore  concatenating  the columns,  stops  the 
               indexes from being used.

                    example a) should be rewritten as in  example 
               b)

                    a)

                    SELECT    *
                    FROM      job
                    WHERE     db_id||job_no = 'AZ0100201'

                    b)

                    SELECT    *
                    FROM      job
                    WHERE     db_id = 'AZ'
                    AND       job_no  = '0100201'


                    Developers  need  to  be aware  of  the  type 
               conversions   that  Oracle  performs   implicitly. 
               Oracle may choose to convert either the column  or 
               the  constant.  If the column is chosen  then  the 
               indexes  cannot be used. The conversion chosen  is 
               defined in the following table.

                    Mixed               Common         Function
                    datatype            Unit           chosen
                    --------------      -----------    --------

                    Char with number    number         to_number
                    Char with rowid     rowid          to_rowid
                    Char with date      date           to_date

                    The   following   example   shows   how   the 
               conversion table given above works in  practice.



                    SELECT    deptno
                    FROM      dept 



 GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 9  

 GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 10  AUTHOR : Tina London
 DATE : August 7, 1992

                    WHERE     deptno = 1324

                    indexes

                    non unique index on deptno

                    The  select would not use the  index  because 
               the  column  deptno would be  to_numbered  by  the 
               kernel.

                    The following queries will use the indexes.

                    SELECT    deptno
                    FROM      dept 
                    WHERE     deptno = to_char(1324)

                    SELECT    deptno
                    FROM      dept 
                    WHERE     deptno = '1324'

          3.3  Intentionally disabling indexes.

                    When  it  has  been  decided,  that   certain 
               indexes   need   to   be   disabled,   for   query 
               optimisation, the following column modifiers  will 
               be used.

                    Datatype       Inhibit expression
                    --------       ------------------

                    Char           char||''
                    number         number+0
                    date           add_months(date,0)


                    The  use  of  nvl(column,0)  works  with  all 
               datatypes,  however,  it  could  be  confusing  in 
               queries   which  perform  a  lot   of   arithmetic 
               calculation. See example below.

                    SELECT    deptno
                    FROM      dept
                    WHERE     nvl(deptno,0) = '1234'     

          3.4  Investigate Rewriting the Query

                     SQL is a very expressive language and  there 
               are  normally several ways of performing the  same 
               query.  Developers  should  investigate  different 
               wordings of the same query, so as to identify  the 
               optimal query.




 GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 10  

 GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 11  AUTHOR : Tina London
 DATE : August 7, 1992

                    This  can  be demonstrated by  the  following 
               example, which is about finding all the people who 
               don't have jobs to do. There are 99 people in  the 
               system and 9900 jobs

                     There are indexes on name on both tables.      

                     The first attempt

                    SELECT    p.name
                    FROM      people p,
                              job j
                    WHERE     p.name = j.name(+)
                    AND       j.name is null

                    fred the 27                                                                     

                    1 record selected.

                    Time 51.40 secs.

                    Not a very good result this time. 

                    The second attempt

                    SELECT    name 
                    FROM      people
                    WHERE     name not in 

(SELECT name
FROM job) fred the 27 1 record selected. Time 6.11 secs A much better attempt, there is nearly an order of magnitude improvement. We might now be inclined to rest on our laurels, make a cup of tea and put our feet up, however, if we correlate the sub query to get. SELECT name FROM people p where not name in
(SELECT name
FROM job j WHERE p.name = j.name) fred the 27 1 record selected. Time 1.08 secs.

 GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 11  

 GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 12  AUTHOR : Tina London
 DATE : August 7, 1992

          3.5 Use of the Exists operator

                    One operator which seems to be ignored is the 
               EXISTS  operator. This can be particularly  useful 
               in forms work for validating foreign keys. In  the 
               following  example we wish to know  whether  'fred 
               the 45' has any jobs. 

                    The first example is

                    SELECT    distinct 'x'
                    FROM      job
                    WHERE     name = 'fred the 45'

                    1 record selected.

                    Time  0.45 secs.

                    The  second example uses the exists  operator 
               and is almost twice as fast.

                    SELECT    'x'
                    FROM      dual 
                    WHERE     exists 

( SELECT 'x'
FROM job where name = 'fred the 45') 1 record selected. Time 0.26 secs. The reason this is faster is that with the exists operator the oracle kernel knows that once it has found one match it can stop. It therefore doesn't have to continue the FTS (a TLA (Three Letter Acronym) for Full Table Scan). Things start becoming interesting, now the actual value of the data item being searched on determines which query to use. Consider the people table with 10,000 entries. Enquiring about 'fred the 34' and 'fred the 9999' gives the following. SELECT distinct 'x' FROM job WHERE name ='fred the 34' Time 6.65 secs.

 GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 12  

 GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 13  AUTHOR : Tina London
 DATE : August 7, 1992

                    SELECT    'x'
                    FROM      dual 
                    WHERE     exists 

(SELECT 'x'
FROM job WHERE name = 'fred the 34') Time 0.28 secs. SELECT 'x' FROM dual WHERE exists
(SELECT 'x'
FROM job WHERE name = 'fred the 9999') Time 8.28 secs. Ok I cheated somewhat the data goes into the table in name order. Thus 'fred the 1' goes in first data block and 'fred the 9999' goes in last one. Developers should be aware that, the efficiency of EXISTS and IN is dependant on the amount of data in each table. A query with IN in it drives from the subquery accessing the main query for each row returned, when, a query uses EXISTS it drives from the main query accessing the subquery for each row returned. So if the subquery returns few rows, but, the main query returns a lot of rows for each row from the subquery use the IN operator. 3.6 Don't perform unnecessary joins. In some parts of our system, developers are joining onto the sys_param table. This is bad practice and must be stopped. The sys_params table was designed to hold static system wide values. The correct use of this table, is for the application code to get the values it requires once at the start of the code, or in the case of regular update cycles, at the start of each such cycle. At the logical level, any runtime modification of sys_params, probably indicates

 GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 13  

 GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 14  AUTHOR : Tina London
 DATE : August 7, 1992

               missing entities.

          3.7 Resource intensive operations

                    Queries  which  use DISTINCT,  UNION,  MINUS, 
               INTERSECT,  ORDER BY  and GROUP BY call  upon  the 
               kernel  to  perform resource intensive  sorts.   A 
               DISTINCT   requires  one  sort,  the   other   set 
               operators  require at least two sorts. Other  ways 
               of  writing  these queries should be  found.  Most 
               queries  that use the set  operators,  UNION,MINUS 
               and INTERSECT, can be rewritten in other ways.

          3.8  Use realistic test data.

                    Realistic  test data, which matches  both  in 
               volume  and values, that shown in the ER  diagrams 
               will be used.


                    The following is based on the example used in 
               3.4  about  rewording queries. There are  now  999 
               people  in the table and 9990 jobs. Thus  we  have 
               increased our people by a factor of ten.

                    Notice how the query performed with an  outer 
               join  is much quicker than that performed  by  the 
               subquery.  This  is  a complete  reversal  of  the 
               result in 3.4.

                    This demonstrates quite clearly that the  sql 
               developer  must  have  a  knowledge  of  both  the 
               structure  of the data and the typical  number  of 
               rows in each table.

                    SELECT    p.name
                    FROM      people p,
                              job j
                    WHERE     p.name = j.name(+)
                    AND       j.name is null

                    fred the 218                                                                    

                    Time 23.20 secs.

                    SELECT    name 
                    FROM      people
                    WHERE     not name in 

(SELECT name
FROM job) fred the 218

 GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 14  

 GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 15  AUTHOR : Tina London
 DATE : August 7, 1992

                    Time 193.46 secs.

                    SELECT    name 
                    FROM      people p
                    WHERE     not name in 

(SELECT name
FROM job j WHERE p.name = j.name) fred the 218 Time 8.66 secs. 3.9 Use of != operator Unless it is absolutely necessary avoid using the != operator. The use of this operator disables the use of indexes, because the Oracle kernel assumes that the query will be retrieving most of the rows in the table. 3.10 Use Oracle's trace facility. For every query, even the simplest, check the output from oracle's trace facility. The first step in optimising queries is to eliminate FTS's, this can be done using trace. See APPENDIX F for details on the trace facility. 3.11 Management of oracle cursors. Where ever possible, oracle cursors should be declared at the start of the program. It is also good practice to tell the system the maximum number of cursors you want, at the start of the program. This can be done in DB by opening the cursors you require, in Pro*c set MAXCURSORS. There are two 'expensive' operations during the execution of a query, apart from returning the rows. These are associated with the 'parse' and 'bind' phase of query execution. What is meant by the parse phase. This is the time when the sql text that the programmer has written is translated into an internal representation, that the Oracle kernel can understand. The following activities happen during the parse phase. a) data dictionary look up and security

 GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 15  

 GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 16  AUTHOR : Tina London
 DATE : August 7, 1992

               checking.  This activity causes the oracle  kernel 
               to produce internal sql queries, which are  called 
               recursive queries, to check such things as whether 
               this user is allowed access to this column etc.

                    b)  query  optimisation. Part  of  the  query 
               optimisation  which  deals  with  views  and   sub 
               queries is performed.

                    Performing  all  of  the  above  can  take  a 
               considerable  time. Therefore they should be  done 
               as infrequently as possible.         


                    The binding phase takes place after the parse 
               phase. It is at this stage that the user variables 
               are decided. For instance in the query select  'x' 
               from  person where name = :1 the host  variable  1 
               would have its value 'bound' into the query during 
               binding  by  having the address of  :1  associated 
               with the host variable.

                    Under  normal  circumstances,  wms   programs 
               should  perform the following steps, for  a  query 
               that is the be executed more than once.

                    Step 1    

                    During program start-up set up and parse  the 
               queries with calls to sql_set-up.

                    Step 2

                    Execute the query with a call to  sql_execute 
               supplying the necessary bind variables.

                    Step 3

                    Fetch and process the returned rows by  using 
               sql_fetch.

                    To execute the query again, return to step  2 
               and   supply  new  bind  variables.  For   further 
               information    see    the    document     entitled 
               DB_user.doc.

          3.12      The 10,15,20 percent rule

                    Only  use  indexes for retrieving  data  from 
               tables,  where your want less than 15  percent  of 
               the rows. FTS's are quicker if you want more  than 
               15 percent of a tables data.



 GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 16  

 GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 17  AUTHOR : Tina London
 DATE : August 7, 1992

                    The rule is called the 10,15,20 percent rule, 
               because  it depends on which oracle  document  you 
               read, as to whether its 10,15 or 20 percent.


          4.0 Insert statements.

          4.1  Use of * in insert statements

                    Although  the use of the shorthand  character 
               '*'  is  allowed  by  the  oracle  kernel,  it  is 
               considered  to be bad programming practice. It  is 
               not allowed in  the WMS project.

                    The reason for it being bad practice, is that 
               if  a table is modified during an upgrade  to  the 
               database, subsequent inserts could fail out in the 
               field.


          5.0  Update Statements.

          5.1  Correlated updates.


                    One  of  the slowest commands in SQL  is  the 
               UPDATE. This is largely due to the fact that  most 
               correlated  updates  require a  full  table  scan.  
               This  results  in very slow performance  when  the 
               table is extremely large.

                    The following update statement is typical  of 
               correlated updates:

               Update Target_Table
               Set Target_Field = (Select Source_Information
                                   From Source_Table
                                   Where Source_Table.Key =
                                   Target_Table.Key)
               Where exists (Select 'x'
                             From Source_Table
                             Where Source_Table.Key = 
                             Target_Table.Key)

                    Performance  problems arise because there  is 
               no method of eliminating rows in the  Target_Table 
               based  on  information in  the  Source_Table.   If  
               other  conditions in the Update's Where clause  do 
               not   disqualify   most  of  the   rows   in   the 
               Target_Table, this update will require substantial 
               processing time.



 GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 17  

 GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 18  AUTHOR : Tina London
 DATE : August 7, 1992

                    The following PL/SQL code effectively uses an 
               index  on  the  Key  field  to  access  only   the 
               appropriate records in the Target_Table:

                       Declare
                         Cursor Source is
                         Select *
                         From Source_Table;
                         Begin
                         For Row in Source Loop
                         Update Target_Table
                         Set Target_Field = Row.Source_Information
                         Where Key = Row.Key;
                         End Loop;
                         Exception
                         When OTHERS Then
                         Null;
                       End;

                    This PL/SQL script loops through each of  the 
               records  in  the  Source_Table  and  updates   the 
               appropriate  row  in  the  Target_Table,  if  any.  
               Essentially, this transfers the full table scan to 
               the  Source_Table  and  allows the  index  on  the 
               Target_Table to be used effectively.

                    Running  a  typical correlated update  on  an 
               8,000 row table to update 9 records required  19.4 
               CPU  seconds.   The same update using  the  PL/SQL 
               script  executed  in  1.12 CPU seconds  --  a  94% 
               performance improvement.

                    The   PL/SQL  script  will   outperform   the 
               correlated  update  whenever the  Source_Table  is 
               smaller  than  the Target_Table.  The  larger  the 
               Target_Table  compared  to the  Source_Table,  the 
               more  substantial the performance gain.   With  an 
               effective  index, the size of the Target_Table  is 
               no longer a factor in the time required to  update 
               the  table;  the number of records  being  updated 
               determines the performance.


                    Replacing   the  EXISTS  subquery   with   IN 
               subquery will give same improvement in most cases.  

          6.0  Optimising oracle queries.


          6.1  Table access.




 GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 18  

 GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 19  AUTHOR : Tina London
 DATE : August 7, 1992

                    There  are three ways that Oracle can find  a 
               row in a table, these are :

                    1)  Scan  each  row  in  the  table  checking 
               whether  to  select the row for  output.  This  is 
               commonly  known as a full table scan (FTS), it  is 
               also the least efficient method of locating a row.

                    2) By using an index to locate the row.

                    3) By using the rowid to directly access  the 
               row on disc. This is the most efficient method  of 
               accessing  a  row. Do not use this  method  across 
               transactions  as  it is possible for  a  rowid  to 
               change, use the primary key of the table in  these 
               circumstances.

                    Unfortunately  we  do not normally  know  the 
               rowid,  so the fastest method of access is by  the 
               use of indexes.

                    The general rules are as follows:

                    Indexes   can  be  used  in   the   following 
               circumstances.      


                    1)    If the indexed column is  mentioned  in 
               the where clause.

                    2)    If the indexed column is not  modified 
               by a function or, an arithmetic operation, or both 
               of them. The following exception is applicable,  a 
               MIN(column), MIN(column+constant) or  MAX(column), 
               MAX(column+constant).


                    Indexes  cannot  be  used  in  the following 
               circumstances.

                    1) there is no 'where' clause.

                    2) the indexed column is modified in any way.

                    3) the search is for 'null' or 'not null'

          6.2  Indexes and Null.

                    If  a record has no value (i.e. NULL) in  the 
               column specified as the index key, then there will 
               not  be  an index entry for the record.  For  this 
               reason  the following query will not use an  index 



 GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 19  

 GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 20  AUTHOR : Tina London
 DATE : August 7, 1992

               on column COMM

                    SELECT    *
                    FROM      emp     
                    WHERE     comm is NULL

                    Oracle  assumes that the majority of  records 
               in  a  table  will  contain  values  for   indexed 
               columns,  because of this a FTS will  be  executed 
               for the following query.

                    SELECT    *
                    FROM      emp
                    WHERE     comm is not NULL

                    A  FTS may not be appropriate if  the  column 
               COMM is sparsely populated. There is an example of 
               how to alter the query to make use of the index.

                    SELECT    *
                    FROM      emp
                    WHERE     comm > -0.01

          6.3  Indexes and 'NOT=' predicates.

                    When  a  predicate  contains  a  "not  equal" 
               condition,  Oracle  will not  consider  an  index, 
               however,   Oracle  will  interpret   other   'not' 
               predicates so as to use any indexes available e.g.

                    'not sal > 50' => 'sal <= 50'
                    'not sal <= 50' => 'sal > 50'

          6.4  Group by and predicate clauses.


                    The  performance of group by queries  can  be 
               improved by eliminating unwanted rows early in the 
               selection  process.  The  following  two   queries 
               return  the  same  data, however,  the  second  is 
               potentially quicker, since rows will be eliminated 
               before the set operators are applied.

                    SELECT    job,
                              avg(sal)
                    FROM      emp
                    GROUP BY  job
                    HAVING    job = 'president'
                    OR        job = 'manager'


                    SELECT    job,



 GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 20  

 GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 21  AUTHOR : Tina London
 DATE : August 7, 1992

                              avg(sal)
                    FROM      emp
                    WHERE     job = 'president'
                    OR        job = 'manager'
                    GROUP BY  job

          6.5  Multiple index queries.

                    When  a  query  has  two  or  more   equality 
               predicate  clauses, multiple indexes may be  used. 
               Oracle  will  merge  the  indexes  at  run   time, 
               returning  rows  that  are in  both  indexes,  for 
               example.


                    SELECT    ename
                    FROM      emp
                    WHERE     deptno=20
                    AND       job='manager'

                    Will merge the following indexes.

                    non unique index on job
                    non unique index on deptno

          6.6  When indexes cannot be merged

                    Where a query can use both range and equality 
               predicates,  as in the following  example,  oracle 
               cannot merge the indexes. The index on job will be 
               used to locate the 'manager' rows, these will then 
               be checked to ensure deptno is greater than 10.

                    SELECT    *
                    FROM      emp
                    WHERE     job='manager'
                    AND       deptno > 10

                    indexes:

                    non unique index on job
                    non unique index on deptno

                    When there is no clear preference as to which 
               index to use, see the following query, Oracle will 
               use  only one index, because it is inefficient  to 
               merge them. Note that a scan of the second  table, 
               will have to be done for each row returned in  the 
               first table.

                    Since  both  indexes are non  unique,  Oracle 
               will  choose the index which, it encounters  first 



 GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 21  

 GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 22  AUTHOR : Tina London
 DATE : August 7, 1992

               in   the   DC_INDEXES  cache.  In   practice,   as 
               developers  cannot  see the cache, the  choice  is 
               arbitrary.

                    SELECT    ename
                    FROM      emp
                    WHERE     sal > 1
                    AND       empno > 1

                    indexes :

                    non unique index on empno
                    non unique index on sal

          6.7  Suppression of indexes for performance

                    When  there  is a choice  between  merging  a 
               unique  index  with a non unique one,  the  kernel 
               will  always  use  the  unique  index  and   avoid 
               performing the merge. The reasoning behind this is 
               that  the unique index will return only  one  row. 
               See the following example.

                    SELECT    ename
                    FROM      emp
                    WHERE     sal = 3000
                    AND       empno = 7902

                    indexes :

                    unique index on empno
                    non unique index on sal

                    Only  the  empno  index will be  used,  if  a 
               record  is found then the row will be  checked  to 
               see if the sal = 3000.

                    Oracle  will only use 5 indexes in  a  single 
               sql statement. After 5 indexes have been merged,
               the kernel will check those rows returned for  the 
               values in the remaining predicates.

                    SELECT    *
                    FROM      emp
                    WHERE     empno =7844
                    AND       job = 'salesman'
                    AND       deptno = 30
                    AND       sal = 1500
                    AND       comm = 0
                    AND       ename = 'turner'





 GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 22  

 GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 23  AUTHOR : Tina London
 DATE : August 7, 1992

                    indexes :

                    non unique index on empno
                    non unique index on job
                    non unique index on deptno     
                    non unique index on sal
                    non unique index on comm
                    non unique index on ename

                    Because  all  the predicates  score  equally, 
               according  to APPENDIX A, only five of  the  above 
               indexes will be used. Rows that are returned  will 
               be checked by the kernel to see if the last  value 
               is correct.

          6.8  Concatenated indexes

                    Concatenated   indexes   are   indexes   that 
               reference more than one column. The index will  be 
               available  for  scoring, provided that  the  first 
               part of the index is present and usable.

                    In   the   following   examples   assume    a 
               concatenated index on job and deptno.

                    SELECT    *
                    FROM      emp
                    WHERE     job = 'president'
                    AND       deptno = 10

                    The full index will be used.

                    SELECT    *
                    FROM      emp
                    WHERE     deptno = 10
                    AND       job = 'president'

                    The full index will be used.

                    SELECT    *
                    FROM      emp
                    WHERE     deptno = 10

                    The index will not be used, the leading  part 
               of the index is missing from the statement.

                    SELECT    *
                    FROM      emp
                    WHERE     job = 'analyst'

                    The  leading part of the index will be  used, 
               developers  should check the selectivity  of  this 



 GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 23  

 GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 24  AUTHOR : Tina London
 DATE : August 7, 1992

               part  of the index, it may not be  selective,  and 
               needs to be suppressed.

                    SELECT    *
                    FROM      emp
                    WHERE     job != 'clerk'
                    AND       deptno = 10

                    The  index  cannot  be used  because  the  != 
               prevents the use of the leading part of the index.

          6.9  Or optimisation

                    Unlike the 'AND' operator which requires that 
               a  row passes both predicates, the  'OR'  operator 
               requires  a row to pass either of the  predicates. 
               See the following example

                    SELECT    ename,
                              sal,
                              job
                    FROM      emp
                    WHERE     sal = 3000
                    OR        job = 'clerk'

                    index :

                    non unique on job

                    If the job index is used to identify the rows, 
               then it would only return those employees who  are 
               'clerks',  it  would not  return  those  employees 
               where the sal is 3000 and job is not 'clerk'.

                    A  single concatenated index on job,  sal  or 
               sal,job cannot be used because records with values 
               for the second key, must be found regardless of the 
               value of the first key.

                    If we have an index on sal,job, then with the 
               index on job the kernel can optimise the query.  A 
               single column index on sal could also be used.

                    If  there are indexes available for both  the 
               predicates  that are or'd, then the query will  be 
               processed conceptually, as the union of two select 
               statements. This is shown in the example below.

                    SELECT    ename,
                              sal,
                              job
                    FROM      emp



 GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 24  

 GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 25  AUTHOR : Tina London
 DATE : August 7, 1992

                    WHERE     sal = 3000
                    OR        job = 'clerk'

                    indexes :

                    non unique on job
                    non unique on sal

                    Becomes :


                    SELECT    ename,
                              sal,
                              job
                    FROM      emp
                    WHERE     job = 'clerk'

                    UNION

                    SELECT    ename,
                              sal,
                              job
                    FROM      emp
                    WHERE     sal = 3000
                    AND       job != 'clerk'


                    Notice  that  the kernel has  used  the  last 
               predicate   in  the  or  clause  for  the   single 
               condition   query,  and  it  has  used  the   same 
               predicate  to  form  the  "!="  condition  in  the  
               double condition query. Notice that if we  rewrite 
               the query as

                    SELECT    ename,
                              sal,
                              job
                    FROM      emp
                    WHERE     job = 'clerk'
                    OR        sal = 3000

                    indexes :

                    non unique on job
                    non unique on sal

                    Becomes :


                    SELECT    ename,
                              sal,
                              job



 GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 25  

 GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 26  AUTHOR : Tina London
 DATE : August 7, 1992

                    FROM      emp
                    WHERE     sal = 3000

                    UNION

                    SELECT    ename,
                              sal,
                              job
                    FROM      emp
                    WHERE     job = 'clerk'
                    AND       sal != 3000

                    From  this,  we can see that it  is  best  to 
               place  the  predicate  associated  with  the  most 
               selective index first in the where clause, and the 
               least selective last. This minimises the number of 
               checks for '!='.

                    These  'or' optimisations cannot  take  place 
               when  the sql query contains a connect by,  or  an 
               outer join.

          6.10 Non correlated sub queries

                    There are two cases to consider here, firstly 
               queries  which use the IN operator, and  secondly, 
               queries  which  use the NOT IN operator.  We  will 
               deal  with  the IN operator  first.

                    The  following rules apply to optimising  the 
               query.

                    1)   The  main and subqueries  are  optimised 
               separately.

                    2)  The same rules for optimisation apply  to 
               the main and subqueries, for example, in the query 
               below  there  is no optimisation of  the  subquery 
               because there is no where clause.


                    3)   The driving table is the table from  the 
               subquery. In the example below this will be job.

                    4)   The subquery is transformed into a  join 
               by  the following method. The rows  frturned  from 
               the subquery are sorted and duplicates removed.  A 
               full  table scan of these sorted rows is used  for 
               access,  finally the  table in the main  query  is 
               joined to the sorted rows from the subquery  using 
               column in the main query's where clause.




 GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 26  

 GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 27  AUTHOR : Tina London
 DATE : August 7, 1992

                    SELECT    distinct name 
                    FROM      men
                    WHERE     pin in 

(SELECT pin
FROM job) indexes : gives the following explain plan. SORT(UNIQUE) MERGE JOIN TABLE ACCESS (FULL) OF 'MEN' SORT(JOIN) TABLE ACCESS (FULL) OF 'JOB' Execute time 4759 Note that the SORT(UNIQUE) is caused by the distinct. If we now put a dummy where clause on the sub query we get SELECT distinct name FROM men WHERE pin in
(SELECT pin
FROM job ) indexes : unique on job(jobno) non unique on job(pin) gives the following explain plan SORT(UNIQUE) NESTED LOOPS TABLE ACCESS (FULL) OF 'MEN' INDEX(RANGE SCAN) OF 'JOB_1' (NON_UNIQUE) Note that an index has been used on job. This is the one case where an index can be used even though there is no where clause on the subquery. The sql is executed conceptually as, SELECT distinct pin FROM job ORDER BY pin, these sorted rows are then joined with the rows from men using the index job(pin).

 GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 27  

 GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 28  AUTHOR : Tina London
 DATE : August 7, 1992

                    The  use  of  NOT  IN  is  not   recommended, 
               developers should consider rewording queries using 
               this  construct  to use outer joins,  however,  it 
               should  be  noted that outer joins are  an  oracle 
               extension, and are not available on other RDBMS's. 
               See below for example.

                    SELECT    *
                    FROM      dept
                    WHERE     deptno not in

(SELECT deptno
FROM emp) This can be rewritten as the following outer join. SELECT d.* FROM dept d,emp e WHERE d.deptno = e.deptno(+) AND e.rowid is NULL This relies on the fact that each row has a unique rowid, and that, rowid is never NULL. Note also that any NOT IN is equivelent to a NOT EXISTS. Therefoe any NOT EXISTS can be transformed into an outer join. 6.11 Correlated sub queries All Correlated subqueries follow the same execution path. The main and the subqueries are separately optimised. The driving table in selected from the main query. For each row returned in the main query, the subquery is executed. Indexes can be used for the where clauses on both the main and subquery.

 GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 28  

 GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 29  AUTHOR : Tina London
 DATE : August 7, 1992

                                   APPENDIX A



          Query Paths Ranked in Order of Speed

          The lower the rank, the faster the path.
                      Rank     Path
                      ======   ==================================

                       1       ROWID = constant
                       2       Unique indexed column = constant
                       3       entire unique concatenated index = 
                               constant
                       4       entire cluster key = corresponding 
                               cluster  key in another  table  in 
                               the same cluster
                       5       entire cluster key = constant
                       6       entire   non-unique   concatenated 
                               index = constant
                       7       non-unique  single  column   index 
                               merge
                       8       most leading concatenated index  = 
                               constant
                       9       indexed  column BETWEEN low  value 
                               AND   high   value,   or   indexed                     
                               column LIKE 'C%' (bounded range)
                      10       sort/merge (joins only)
                      11       MAX  or  MIN  of  single   indexed 
                               column
                      12       ORDER BY entire index
                      13       full table scans
                      14       unindexed  column =  constant,  or 
                               column  IS  NULL, or  column  LIKE 
                               '%C%' (full table scan)


















 GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 29  

 GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 30  AUTHOR : Tina London
 DATE : August 7, 1992

 GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 30  

 GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 31  AUTHOR : Tina London
 DATE : August 7, 1992

                                APPENDIX B



                         Table Structure for demos

          table oe (id number(6,0), 
                    oe number(1,0),
                    timestamp(date))

          The field id holds the unique reference number starting 
          at  1 the field oe holds a 0 if the id is even and a  1 
          if it is odd

          unique index on id
          nonunique index on oe

          there are 100,000 rows in oe

          table oe_link (oe number(1),name char(10)

          with 2 records 

          table oe_link_name (name char(10),age number(3))

          With 1 record 

          people (name, sex)

          job(job_id,name)























 GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 31  

 GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 32  AUTHOR : Tina London
 DATE : August 7, 1992

                                APPENDIX C



          Update Performance                                RDBMS
          John R. Pack                                    VAX/VMS
          July 17, 1990                                 V6.0.30.4
          (Revised 19-Sep-90)

                          Using PL/SQL to Enhance
                            Update Performance


               One of the slowest commands in SQL is the  UPDATE.  
          Most often, when a client complains about  performance, 
          a  large, correlated update (or,worse, an  uncorrelated 
          update) is at the root of the problem.Often, even after 
          creating  the optimal indexes and fine tuning  the  SQL 
          statement itself, the update is still hogging  enormous 
          CPU resources and is still the bottleneck in the user's 
          production scheme.

               The Correlated Update

               This  is  largely  due  to  the  fact  that   most 
          correlated  updates  require a full table  scan.   This 
          results  in  very slow performance when  the  table  is 
          extremely large.

               The  following  update  statement  is  typical  of 
          correlated updates:

               Update Target_Table
               Set Target_Field = (Select Source_Information
                                   From Source_Table
                                   Where Source_Table.Key =
                                   Target_Table.Key)
               Where exists (Select 'x'
                             From Source_Table
                             Where Source_Table.Key = 
                             Target_Table.Key)

               Performance  problems  arise because there  is  no 
          method of eliminating rows in the Target_Table based on 
          information in the Source_Table.  If  other  conditions 
          in the Update's Where clause do not disqualify most  of 
          the rows in the Target_Table, this update will  require 
          substantial processing time.




 GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 32  

 GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 33  AUTHOR : Tina London
 DATE : August 7, 1992

               In addition, this type of query suffers because an 
          index would not help the Target_Table access time.   In 
          fact,   the   only  index  which  will    improve   the 
          performance of the preceding update is an index on  the 
          Source_Table Key field.  If the Source_Table is  large, 
          this is very useful.

               Nevertheless,  the  best method  of  updating  the 
          table would be able to access just the correct rows  in 
          the Target_Table using an effective index.  This method 
          is  now  available  using PL/SQL and an  index  on  the 
          Target_Table Key field.

               The PL/SQL Update



               The  following  PL/SQL code  effectively  uses  an 
          index  on the Key field to access only the  appropriate 
          records in the Target_Table:

                       Declare
                         Cursor Source is
                         Select *
                         From Source_Table;
                         Begin
                         For Row in Source Loop
                         Update Target_Table
                         Set Target_Field = Row.Source_Information
                         Where Key = Row.Key;
                         End Loop;
                         Exception
                         When OTHERS Then
                         Null;
                       End;

               This  PL/SQL  script  loops through  each  of  the 
          records in the Source_Table and updates the appropriate 
          row  in  the Target_Table, if any.   Essentially,  this 
          transfers  the full table scan to the Source_Table  and 
          allows  the  index  on  the  Target_Table  to  be  used 
          effectively.

               Performance Gain

          Running  a  typical correlated update on an  8,000  row 
          table  to update 9 records required 19.4  CPU  seconds.  
          The  same  update using the PL/SQL script  executed  in 
          1.12 CPU seconds -- a 94% performance improvement.

               The  PL/SQL script will outperform the  correlated 
          update  whenever the Source_Table is smaller  than  the 



 GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 33  

 GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 34  AUTHOR : Tina London
 DATE : August 7, 1992

          Target_Table.  The larger the Target_Table compared  to 
          the Source_Table, the more substantial the  performance 
          gain.   With  an  effective  index,  the  size  of  the 
          Target_Table is no longer a factor in the time required 
          to  update  the  table; the  number  of  records  being 
          updated determines the performance.


               Replacing  the  EXISTS subquery with  IN  subquery 
          will give same improvement in most cases.  













































 GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 34  

 GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 35  AUTHOR : Tina London
 DATE : August 7, 1992

                                APPENDIX D



                GUIDELINES FOR USING THE OUTER JOIN SYNTAX




           Guidelines for Using the Outer Join Syntax       RDBMS
           RDBMS Support                                     V6.0
           October 1990


           1  INTRODUCTION

                        The purpose of this technical note is  to  
          provide   some       guidelines  on  how  to   use  the 
          outer join facility provided by       ORACLE.

           1.1  Outer Join Semantics - Definitions

                        The following terms, used to describe the  
          operation  on       outer joins, are defined :-

            'outer-join column' - a column reference followed  by 
                                  the symbol (+),  e.g.  EMPNO(+) 
                                  and  DEPT.DEPTNO(+)  are  outer 
                                  join columns

            'simple predicate'  - a logical expression containing 
                                  no  AND's,  OR's,  or  NOT's  ( 
                                  usually a simple relation  such 
                                  as A = B )

        'outer join predicate'  - a  simple predicate  containing 
                                  one or more outer join columns.



          2  OUTER JOIN SYNTAX - RULES

                         An  outer  join  predicate   may    only  
          contain  outer  join columns  from one table ( in other 
          words,  all outer join columns in a single  outer  join 
          predicate  must  belong  to  the   same   table).  This  
          means,   for  example, that the following statement  is 
          illegal :-

        EMP.EMPNO(+) = DEPT.DEPTNO(+)  -  outer join columns from
                                          two tables



 GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 35  

 GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 36  AUTHOR : Tina London
 DATE : August 7, 1992

                        Also,  if a column  in  a  predicate   is  
          an   outer  join       column,  then all  columns  from 
          the  same  table  must be outer join  columns  in  that 
          predicate.   This  means,  for   example,   that    the 
          following statement is illegal :-

         EMP.SAL + EMP.COMM(+) = SALGRADE.HIGH  - mixed columns from
                                                  one table

               In  a predicate, the  table  referenced   with   a  
          (+)   is directly 'outer  joined'  to all other  tables 
          in  the predicate. It is indirectly 'outer  joined'  to 
          any    tables   to   which   these  other  tables   are 
          themselves  'outer  joined'.  A predicate  may  not  be 
          directly  or indirectly 'outer joined' to itself.  This 
          means, for  example, that  the  following   combination 
          of predicates is illegal :-

              EMP.EMPNO(+)   = PERS.EMPNO
         AND  PERS.DEPTNO(+) = DEPT.DEPTNO
         AND  DEPT.JOB(+)    = EMP.JOB           - circular outer
                                                   join relationship



      3  OUTER JOIN EXECUTION

              For a given table, T, there may be both outer  join  and
      non-outer join predicates.  Execution occurs ( conceptually ) as
      follows :-

           1. The result of joining all tables mentioned in table T's
              outer join predicates is formed ( by recursive application
              of this algorithm ).

           2. For each row of the result, a set of composite rows is
              formed, each consisting of the original row in the
              result joined to a row in table T for which the composite
              row satisfies all of table T's outer join predicates.

           3. If a set of composite rows is the null set, a composite
              row is created consisting of the original row in the
              result joined to a row similar to those in table T, but
              with all values set to null.

           4. Rows that do not pass the non-outer join predicates
              are removed.

              This  may  be  summarised  as   follows.    Outer   join
      predicates  (  those  with  (+) after a column of table T ), are
      evaluated BEFORE table T is augmented with a null row.  The null



 GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 36  

 GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 37  AUTHOR : Tina London
 DATE : August 7, 1992

      row  is  added only if there are NO rows in table T that satisfy
      the  outer  join  predicates.   Non-outer  join  predicates  are
      evaluated AFTER table T is augmented with a null row (if needed)


      4  OUTER JOIN - RECOMMENDATIONS

              Certain types of  outer  joins  in  complicated  logical
      expressions  may  not  be  well  formed.  In general, outer join
      columns in predicates that are  branches  of  an  OR  should  be
      avoided.    Inconsistencies  between  the  branches of  the 
          OR can
      result in an ambiguous query, and this may not be detected.   It
      is  best  to  confine outer join columns to the top level of the
      'where' clause, or to nested AND's only.


      5  OUTER JOIN - ILLUSTRATIVE EXAMPLES

      5.1  Simple Outer Join

         SELECT ENAME, LOC
         FROM   DEPT, EMP
         WHERE  DEPT.DEPTNO = EMP.DEPTNO(+)


              The predicate is evaluated BEFORE null augmentation.  If
      there is a DEPT row for which there are no EMP rows, then a null
      EMP row is concatenated to the DEPT row.


      5.2  Outer Join With Simple Post-Join Predicates

         SELECT ENAME, LOC
         FROM   DEPT, EMP
         WHERE  DEPT.DEPTNO = EMP.DEPTNO(+)
         AND    EMP.DEPTNO IS NULL

              The second simple  predicate  is  evaluated   AFTER  
          null
      augmentation,  since  there  is no (+), removing rows which were
      not the result of null augmentation and hence leaving only  DEPT
      rows for which there was no corresponding EMP row.


      5.3  Outer Join With Additional Pre-Join Predicates

         SELECT ENAME, LOC
         FROM   DEPT, EMP
         WHERE  DEPT.DEPTNO = EMP.DEPTNO(+)
         AND    'CLERK'     = EMP.JOB(+)
         AND    EMP.DEPTNO IS NULL



 GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 37  

 GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 38  AUTHOR : Tina London
 DATE : August 7, 1992

              The predicate on EMP.JOB is evaluated at the  same  time
      as  the  one  on  EMP.DEPTNO  -  before null augmentation.  As a
      result, a null row is augmented to any DEPT row for which  there
      are no corresponding clerks's in the EMP table.  Therefore, this
      query displays departments containing no clerks.

              Note that it the  (+)  were  omitted  from  the  EMP.JOB
      predicate,  no  rows  would be returned.  In this case, both the
      EMP.JOB  and EMP.DEPTNO IS NULL predicates  are   evaluated  
          AFTER
      the  outer  join,  and  there  can be no rows for which both are
      true.










































 GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 38  

 GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 39  AUTHOR : Tina London
 DATE : August 7, 1992

                                APPENDIX E



                  The Ten commandments for fast Queries



          THOU SHALL ACCIDENTLY NOT DISABLE THE USE OF INDEXES  BY  
                MODIFYING A COLUMN WITHIN THE WHERE CLAUSE


            THOU  SHALL PUT THE TABLE THAT RETURNS THE  FEWEST  
                 ROWS LAST IN THE FROM LIST OF THE QUERY.


               THOU WILL INVESTIGATE REWORDING THY QUERY IN 
                             SOME  OTHER WAY.


           THOU SHALL USE THE EXISTS OPERATOR WHENEVER POSSIBLE.


           THOU  SHALL  NOT  ASK THE KERNEL FOR  MORE  THAN  THOU 
                                 WANTEST.


              THOU  SHALL NOT FORCE REPARSING OF A QUERY WHEN  
                       ONLY CHANGING BIND VARIABLES


               THOU  SHALL KNOW THE DATA VOLUMETRICS AND USE  
                           REALISTIC  TEST DATA.


                THOU SHALL USE TABLE ALIASES IN THY QUERIES


           THOU SHALL NOT USE THE != UNLESS ABSOLUTELY NECESSARY


               THOU SHALL USE THE TRACE FACILITY TO MONITOR 
                               YOUR QUERIES










 GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 39  

 GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 40  AUTHOR : Tina London
 DATE : August 7, 1992

                                APPENDIX F


                             EXPLAIN Facility

MOTIVATION     There is a need for users to be able to determine the steps the system takes in performing various operations on a database. The EXPLAIN facility provides users with a convenient mechanism for getting this information. The facility stores this information in a standard database table that can be manipulated with standard SQL statements.

SYNTAX     The following syntax is based on the syntax used by DB2 for their EXPLAIN facility:

        EXPLAIN PLAN [SET STATEMENT_ID [=] <string literal>]
        [INTO <table_name>]
        FOR <sql_statement>

where

        STATEMENT_ID - a unique optional identifier for the statement;

        INTO - allows user to save the results of the analysis in
          the specified table.  The table must conform to the format
          for the table used to store the analysis ( see TABLE FORMATS
          section for a description of the table format ).  If this 
          clause is not specified, the system will then attempt to store
          the information in a table named <user_id>.PLAN_TABLE .  If
          the explicit or implicit table does not exist the EXPLAIN
          command will fail.

        <sql statement> - an insert, delete, update, or query statement;


TABLE FORMATS Core Table Format

The core table used to represent the plan information consists of the following fields:

STATEMENT_ID - An identifier associated with the statement. If not set by

    the user, the identifier will be NULL. Note that a user may     identify a statement by the timestamp field.

 GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 40  

 GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 41  AUTHOR : Tina London
 DATE : August 7, 1992

TIMESTAMP - The date and time when the statement was analysed.

REMARKS - Any comment the user wishes to associate with this step of

          the analysis.

OPERATION - the name of the operation being performed. The following table

    provides a listing of the operations described by the facility.

        Operation       Description
        ---------------------------------------------------------------
        And-Equal       A retrieval utilising intersection of
                          rowids from index searches
        Connect by      A retrieval that is based on a tree walk
        Concatenation   A retrieval from a group of tables.  It is
                        essentially a UNION ALL operation of the
                        sources.  Used for OR operations.
        Counting        A node that is used to count the number of 
                          rows returned from a table.  Used for queries
                          that use the ROWNUM meta-column.
        Filter          A restriction of the rows returned from a table
        First Row       A retrieval of only the first row
        For Update      A retrieval that is used for updating
        Index           A retrieval from an index
        Intersection    A retrieval of rows common to two tables
        Merge Join      A join utilising merge scans
        Minus           A retrieval of rows in Source 1 table but not in
                          Source 2 table
        Nested  Loops    A join utilising  nested  loops.   Each 
          value in the
                          first subnode is looked up in the second subnode.
                          This is often used when one table in a join is 
                          indexed and the other is not.
        Project         A retrieval of a subset of columns from a table
        Remote          A retrieval from a database other than the current
                          database
        Sequence        An operation involving a sequence table
        Sort            A retrieval of rows ordered on some column or group
                          of columns
        Table           A retrieval from a base table
        Union           A retrieval of unique rows from two tables
        View            A retrieval from a virtual table
        -------------------------------------------------------------------

Note that the operation shown when counting the number of rows returned by a query (i.e. select count(*)) is SORT. This is due to the way that COUNT is implemented internally. The table will not really be sorted.

OPTIONS - an option that modifies the operation, e.g.,

        OUTER option on join operations, rationale for sorting, type
        of index scan, type of filter, etc.  The following table



 GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 41  

 GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 42  AUTHOR : Tina London
 DATE : August 7, 1992

        provides a list of the options for the operations that have
        options.

        OPERATION    OPTIONS               DESCRIPTION
        ---------------------------------------------------------------------
        Index        UNIQUE KEY         Unique key lookup on index
                     RANGE              Index range scan
        Merge Join   OUTER              Join is an outer join
        Nested Loops OUTER              Join is an outer join
        Sort         DISTINCT           Sort is to produce distinct values
                     GROUP BY           Sort is for grouping operation
                     JOIN               Sort is for merge join
                     ORDER BY           Sort is for order by
        Table        BY ROWID           Table scan is by rowid
                     FULL               Sequential table scan
                     CLUSTER            Table scan by cluster key
        ---------------------------------------------------------------------

OBJECT_NODE - the name of the node that owns the database object.

OBJECT_OWNER - the name of the schema the owns the database object.

OBJECT_NAME - the name of the database object.

OBJECT_TYPE - a modifier that provides descriptive information about

    the database object, e.g., NON-UNIQUE for indexes, etc.

OBJECT_INSTANCE - a number corresponding to the ordinal position of the

    object as it appears in the original query. The numbering proceeds     from left to right, outer to inner with respect to the original     query text. Note that at this level, view expansion will result in     rather interesting object instance numbers. We will be addressing this     issue fully in future releases.

SEARCH_COLUMNS - the number of leading columns used when searching an

    index.

ID - a number assigned to this operation in the tree.

    Corresponds to a preorder traversal of the row source tree.

PARENT_ID - the number assigned to the previous operation that

    receives information from this operation. This field

          combined with the
    ID field allows users to do a treewalk of the specified     plan with the CONNECT BY statement.

POSITION - the position this database object occupies for the previous

    operation.

OTHER - other information that is specific to the row source that a user

    may find useful. For example, the select statement to a remote node, etc.

 GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 42  

 GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 43  AUTHOR : Tina London
 DATE : August 7, 1992

Sample Table Definition

create table PLAN_TABLE (

        statement_id    char(30),
        timestamp       date,
        remarks         char(80),
        operation       char(30),
        options         char(30),
        object_node     char(30),
        object_owner    char(30),
        object_name     char(30),
        object_instance numeric,
        object_type     char(30),
        search_columns  numeric,
        id              numeric,
        parent_id       numeric,
        position        numeric,
        other           long);

An SQL script to create this table resides in file xplainpl.sql in the same directory containing the file catalog.sql. This table must reside in the current schema unless you use the optional INTO clause of the EXPLAIN command.

EXAMPLES     Suppose we issue the following statements:

        EXPLAIN PLAN 
        SET STATEMENT_ID = 'query1'
        INTO QUERY_PLANS
        FOR SELECT * FROM T1,T2,T3 WHERE T1.F1 = T2.F1 AND T2.F2 = T3.F2;

        SELECT OPERATION, OPTIONS, OBJECT_NAME, ID, 
          PARENT_ID, POSITION
          FROM QUERY_PLANS 
          WHERE STATEMENT_ID = 'query1'
          ORDER BY ID;

The following output would be created:

OPERATION OPTIONS OBJECT_NAME ID PARENT_ID POSITION


MERGE JOIN                              1
MERGE JOIN                              2              1           1
SORT          JOIN                      3              2           1
TABLE ACCESS  FULL    T1                4              3           1
SORT          JOIN                      5              2           2
TABLE ACCESS  FULL    T2                6              5           1
SORT          JOIN                      7              1           1



 GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 43  

 GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 44  AUTHOR : Tina London
 DATE : August 7, 1992

TABLE ACCESS  FULL    T3                8              7           1

8 RECORDS selected

   Suppose that an index is created on field F1 on table T1 and the following statements are issued:

        EXPLAIN PLAN
        SET STATEMENT_ID = 'query2'
        INTO QUERY_PLANS
        FOR SELECT * FROM T1 WHERE F1 > 1;


        SELECT OPERATION, OPTIONS, OBJECT_NAME, OBJECT_TYPE, SEARCH_COLUMNS, 
          ID, PREVIOUS_ID
          FROM QUERY_PLANS WHERE
          STATEMENT_ID='query2'
          ORDER BY ID;

The following output is produced:

OPERATION OPTIONS OBJECT_NAME OBJECT_TYPE SEARCH_COLUMNS ID POSITION


TABLE SCAN   BY ROWID T1                                       1
INDEX SCAN   RANGE    IT1        NON-UNIQUE     1              2           1

2 RECORDS selected

    With the same conditions as before, suppose we issue the following which demonstrates an index only retrieval:

        EXPLAIN PLAN
        SET STATEMENT_ID = 'query3'
        INTO QUERY_PLANS
        FOR SELECT F1 FROM T1 WHERE F1 > 1;

        SELECT OPERATION, OPTIONS, OBJECT_NAME, OBJECT_TYPE, SEARCH_COLUMNS,
          ID
          FROM QUERY_PLANS WHERE
          STATEMENT_ID='query3';

The following output is produced:

OPERATION OPTIONS OBJECT_NAME OBJECT_TYPE SEARCH_COLUMNS ID



INDEX SCAN RANGE IT1 NON-UNIQUE 1 1 RECORDS selected

    The next example illustrates the output if a grouping operation is

 GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 44  

 GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 45  AUTHOR : Tina London
 DATE : August 7, 1992

specified in the statement:

        EXPLAIN PLAN
        SET STATEMENT_ID = 'query4'
        INTO QUERY_PLANS
        FOR SELECT AVG(F1),F2 FROM T1 GROUP BY F2;

        SELECT OPERATION, OPTIONS, OBJECT_NAME, ID,
          PARENT_ID
          FROM QUERY_PLANS WHERE STATEMENT_ID = 'query4'
          ORDER BY ID;

OPERATION     OPTIONS    OBJECT_NAME            ID          PARENT_ID
---------------------------------------------------------------------
SORT          GROUP BY                          1
TABLE SCAN    FULL      T1                      2             1

2 RECORDS selected

    The next example illustrates the ouptut if DISTINCT is specified in the statement:

        EXPLAIN PLAN
        SET STATEMENT_ID = 'query5'
        INTO QUERY_PLANS
        FOR SELECT DISTINCT F1 FROM T1;

        SELECT OPERATION, OPTIONS, OBJECT_NAME, ID,
          PREVIOUS_ID
          FROM QUERY_PLANS WHERE STATEMENT_ID = 'query5'
          ORDER BY ID;

OPERATION     OPTIONS    OBJECT_NAME           ID   PREVIOUS_ID
--------------------------------------------------------------
SORT          DISTINCT                         1
TABLE SCAN    FULL      T1                     2         1

2 RECORDS selected

    The next example illustrates the output if a subquery is specified in the statement:

        EXPLAIN PLAN
        SET STATEMENT_ID = 'query6'
        INTO QUERY_PLANS
        FOR SELECT * FROM T1 WHERE F1 < ( SELECT F2 FROM T2 WHERE F1=3);

        SELECT OPERATION, OPTIONS, OBJECT_NAME, ID,
          PARENT_ID, POSITION
          FROM QUERY_PLANS WHERE STATEMENT_ID = 'query6'
          ORDER BY ID;




 GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 45  

 GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 46  AUTHOR : Tina London
 DATE : August 7, 1992

OPERATION OPTIONS OBJECT_NAME ID PARENT_ID POSITION


FILTER          OUT                       1
TABLE SCAN      FULL  T1                  2            1          1
TABLE SCAN      FULL  T2                  3            1          2

3 RECORDS selected

The final example displays a complex query whose output is sent to the default plan table. ( It is assumed that this table has been created before issuing the statement.)

        EXPLAIN PLAN
        SET STATEMENT_ID = 'query7'
        FOR SELECT * FROM T1,T2 WHERE T1.F1 = T2.F1 UNION
            SELECT * FROM T2,T3 WHERE T2.F1 = T3.F1;

        SELECT OPERATION, OPTIONS, OBJECT_NAME, ID, PARENT_ID, POSITION
          FROM PLAN_TABLE WHERE STATEMENT_ID = 'query7'
          ORDER BY ID;

The following output is produced:

OPERATION OPTIONS OBJECT_NAME ID PARENT_ID POSITION


PROJECTION                             1
UNION                                  2             1              1
SORT        DISTINCT                   3             2              1
NEST LOOP                              4             3              1
TABLE SCAN  BY ROWID   T1              5             4              1
INDEX SCAN  RANGE      IT1             6             5              1
TABLE SCAN  FULL       T2              7             4              2
SORT        DISTINCT                   8             2              2
MERGE JOIN                             9             8              1
SORT        JOIN                       10            9              1
TABLE SCAN  FULL       T2              11            10             1
SORT        JOIN                       12            9              2
TABLE SCAN  FULL       T3              13            12             1

13 RECORDS selected

The following example is based on the previous query. It illustrates the use of the treewalking capability in Oracle's version of SQL.

        SELECT LPAD(' ',2*LEVEL)||OPERATION,OPTIONS, OBJECT_NAME
        FROM PLAN_TABLE WHERE STATEMENT_ID='query7'
        CONNECT BY PRIOR ID = PARENT_ID
           and STATEMENT_ID = 'query7'
        START WITH ID = 1
           and STATEMENT_ID = 'query7'
        ORDER BY ID;




 GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 46  

 GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 47  AUTHOR : Tina London
 DATE : August 7, 1992

LPAD(' ',2*LEVEL)||OPERATION


                 OPTIONS   OBJECT_NAME 
-------------------------------------
PROJECTION                        

  UNION                           

    SORT
                 DISTINCT
      NEST LOOP                   

        TABLE SCAN
                 BY ROWID      T1 
            INDEX SCAN
                 RANGE         IT1 
        TABLE SCAN
                 FULL          T2    
    SORT
                 DISTINCT
      MERGE JOIN                     

        SORT   
                 JOIN
          TABLE SCAN  
                 FULL           T2  
        SORT
                 JOIN
          TABLE SCAN 
                  FULL          T3  

13 RECORDS selected

 GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 47  

 GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 48  AUTHOR : Tina London
 DATE : August 7, 1992

                                APPENDIX G


                TRACING IN THE WMS DEVELOPMENT ENVIRONMENT



               The  /wmsnew/bin/sql  script  creates  two   files 
          LASTRUN.LST  and LASTRUN.ENV in the  current  directory 
          and logs the user into sqlplus.

               The    TWO_TASK,   ORACLE_SID    and    SQL_PASSWD 
          environment  variables  must  be  set  before   calling 
          /wmsnew/bin/sql to determine the database account to be 
          accessed.

               Two  files  ( LASTRUN.LST and  LASTRUN.ENV  )  are 
          created  by  sql in the current directory and  are  are 
          used   by  explain  to  identify  the  session  to   be 
          explained.

               Notes : Since user dump files are normally removed 
          on  exit from sqlplus the explain should be run  whilst 
          the sql session is active.

                  explain    requires   the    LASTRUN.LST    and 
          LASTRUN.ENV files so it should be run on another window 
          IN THE SAME DIRECTORY that the sql call was made in.

                  If the PLAN_TABLE does not exist on the account 
          specified  then  sql  will  create  it  on  the   TEMP1 
          tablespace.

               When /wmsnew/bin/sql is called sql you should  see 
          something like the following :

               SQL_PASSWD is wms6/hoddengrey
               ORACLE_SID is oracle
               ORACLE_HOME is /usr/oracle6
               TWO_TASK is T:diamond:oracle
               TTY is ttyp0

SQL*Plus: Version 3.0.9.1.2 - Production on Mon Aug 10 11:41:10 1992

Copyright (c) Oracle Corporation 1979, 1989. All rights reserved.

Connected to:
ORACLE RDBMS V6.0.33.1.4, transaction processing option - Production

 GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 48  

 GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 49  AUTHOR : Tina London
 DATE : August 7, 1992

PL/SQL V1.0.32.3.1 - Production

Session altered.

'PROGRAM|'||PID||'|'||PROGRAM



PROGRAM|29|sqlplus_at_onyx (TCP Two-Task) orapid:19422

SQL>

               You can now use explain to see the execution  plan 
          of any of your queries. Either have a second window  in 
          the  same directory and use %explain, or  use  !explain 
          from the sql prompt.






































 GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 49  

-- 
tina london

-- 
David T. Bath             | Email:dtb_at_otto.bf.rmit.oz.au (131.170.40.10)
Senior Tech Consultant    | Phone: +61 3 347-7511 TZ=AEST-10AEDST-11
Global Technology Group   | 179 Grattan St, Carlton, Vic, 3153, AUSTRALIA
"The robber of your free will does not exist" - Epictetus
Received on Tue Oct 11 1994 - 15:51:49 CET

Original text of this message