Re: Tina London's Article re: Code Development Rules
Date: 11 Oct 1994 09:51 -0500
Message-ID: <11OCT199409512981_at_author.gsfc.nasa.gov>
In article <CxA2D8.n42_at_unocal.com>, dblpraf_at_rogue.unocal.com (Bob Fleisig) writes...
>If anyone still has a copy of Tina London's article regarding >good code development rules, would you please repost the article. > >Thanks in advance. > >|----------------------------------------- NO SPECIAL SIG LINE -----------------------------------------------|
Reprinted SANS permission.
Path: skates.gsfc.nasa.gov!kong.gsfc.nasa.gov!cs.umd.edu!haven.umd.edu!darwin.sura.net!sgiblab!munnari.oz.au!goanna!escargot!otto!dtb
From: dtb_at_otto.bf.rmit.oz.au (David Bath)
Newsgroups: comp.databases.oracle
Subject: Oracle SQL Guidlines / Optimizer notes
Summary: Guidelines for efficient SQL with Oracle by Tina London
Message-ID: <dtb.736743126_at_otto>
Date: 7 May 93 02:52:06 GMT
Reply-To: dtb_at_otto.bf.rmit.oz.au
Organization: Royal Melbourne Institute of Technology
Lines: 3117
NNTP-Posting-Host: otto.bf.rmit.oz.au
I received the following from Tina London in the UK. I think its worth posting to the group. While I format things slightly differently, her reasons are valid and worth browsing. Her notes on the V6 optimiser are great. I will be incorporating an abridged version of some of this in the non-DBA FAQ 0.4. Her current address is tinalondon_at_artemis.demon.co.uk
Thanks Tina !!!!!!!!
GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 1
AUTHOR : Tina London
DATE : August 7, 1992
GUIDELINES AND GOOD PRACTICE GUIDE FOR DEVELOPING SQL Author : Tina London Email : tlondon_at_cix.compulink.co.uk Date : August 7, 1992
GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 1
GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 2
AUTHOR : Tina London
DATE : August 7, 1992
1.0 Introduction
2.0 SQL Layout 2.1 General layout 2.2 SQL Keywords 2.3 Constants and variables. 2.4 Outer Joins 2.5 Table aliases. 2.6 Ordering of where clauses. 3.0 Database selects. 3.1 Ordering of the from clause. 3.2 Unintentionally disabling indexes. 3.3 Intentionally disabling indexes. 3.4 Investigate Rewriting the Query 3.5 Use of the Exists operator 3.6 Don't perform unnecessary joins. 3.7 Resource intensive operations 3.8 Use realistic test data. 3.9 Use of != operator 3.10 Use Oracle's trace facility. 3.11 Management of oracle cursors. 3.12 The 10,15,20 percent rule 4.0 Insert statements. 5.0 Database updates. 6.0 Optimising oracle queries. 6.1 Table access. 6.2 Indexes and Null. 6.3 Indexes and 'NOT=' predicates. 6.4 Group by and predicate clauses. 6.5 Multiple index queries. 6.6 When indexes cannot be merged 6.7 Suppression of indexes for performance 6.8 Concatenated indexes 6.9 Or optimisation 6.10 Non correlated sub queries 6.11 Correlated sub queries Appendix A Query Paths Ranked in Order of Speed Appendix B Tables used in the examples. Appendix C Correlated updates and PL/SQL. Appendix D Guidelines for outer join syntax
GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 2
GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 3
AUTHOR : Tina London
DATE : August 7, 1992
Appendix E The ten commandments for fast queries.
Appendix F Oracle trace facility.
GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 3
GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 4
AUTHOR : Tina London
DATE : August 7, 1992
1.0 Introduction
The purpose of this guide is to present standards and guideline for the wms development team, which will ensure that quality SQL is produced. The rules on SQL layout are important, because, they improve the clarity of the statements. Developers should realise that, as in third generation programming, the first method of writing a program is not necessarily the best, the same is true of fourth generation SQL queries. Adherence to the SQL layout rules needs to be enforced now, as this will allow the project to take advantage of Oracle V7 SQL caching facilities. Version 7 of Oracle has an SQL cache, which will hold parsed queries, Oracle determines whether a query is in the cache by doing a case insensitive comparison All code reviews, which require database access, will have available, as part of the document set, sql trace output formatted with tkprof. This output will be formally reviewed, by the database team, as part of the review process. 2.0 SQL Layout 2.1 General layout SQL statements should be laid out so as to aid readability and maintenance. Each statement should be in a separate sql file. 1) Statements should be indented so that individual lines are neatly aligned. 2) Each expression in the SELECT list should be on a different line. 3) Every table in the FROM list should be on a fresh line. 2.2 SQL Keywords 1) Oracle keywords should be entered in upper case 2) The following keywords should be placed on a new line :
GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 4
GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 5
AUTHOR : Tina London
DATE : August 7, 1992
SELECT INTO FROM WHERE AND/OR GROUP BY HAVING CONNECT BY FOR UPDATE OF ORDER BY 3) All keywords within the same select should line up with the previous keywords. Nested select statements should be indented and lined up with the second word on the previous line, e.g. : SELECT sal, Job, ename, dept FROM emp WHERE sal > any
(SELECT sal
FROM emp WHERE deptno = 30) ORDER BY sal; 4) Lines which don't start with a listed keyword should be lined up with the second word on the previous line, e.g. : SELECT ename, dept FROM emp 2.3 Constants and variables. 1) Constants, program variables etc. should be on the right hand side of a WHERE or HAVING clause. SELECT ename FROM emp WHERE empno = '1232' Or SELECT ename FROM emp WHERE empno = :1
GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 5
GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 6
AUTHOR : Tina London
DATE : August 7, 1992
2.4 Outer Joins.
Columns requiring to be outer joined should appear on the right hand side of a WHERE or HAVING clause. SELECT ename FROM emp e, dept d WHERE e.empno = d.empno(+) 2.5 Table aliases. Table aliases should be used in all queries that have more than one table in the FROM clause. The use of table aliases speeds up the parse phase of an oracle query, by reducing the number of recursive sql queries. SELECT count(*) FROM oe o, oe_link l, oe_link_name n WHERE o.oe = l.oe AND l.name = n.name Notice the table aliases o,l,n and their subsequent use in the where clause. 2.6 Ordering of where clauses. As an aid to understanding, WHERE clauses should be laid out with the join clauses first, and the restriction clauses second. 3.0 Database selects. 3.1 Ordering of the FROM clause. This rule is for when the oracle optimiser is stuck for a good idea. The Oracle optimiser works in the following manner. It looks at each of the WHERE clauses and assigns the tables concerned a number based on the type of predicate e.g. field = 'const' or field(+) = field. It then chooses the table with the lowest score as the driving table. But, and its a big one, if more than one table has the same lowest score then it chooses the last table in the FROM list to be the driving table.
GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 6
GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 7
AUTHOR : Tina London
DATE : August 7, 1992
See appendix A for a listing of the predicate scoring system. In this example there are indexes on the tables as follows Indexes : unique on oe(id) unique on oe_link(oe) unique on oe_link_name(name) non unique on oe(oe) SELECT count(*) FROM oe_link l, oe_link_name n, oe o WHERE o.oe = l.oe AND l.name = n.name Time 621.21 secs SELECT count(*) FROM oe o, oe_link l, oe_link_name n WHERE o.oe = l.oe AND l.name = n.name Time 197.05 secs Notice the difference in query timings. The only difference in the two queries is the order of the tables in the FROM clause. Therefore the table returning the least number of rows should be last in the FROM list. 3.2 Unintentional disabling of indexes. This is a very easy way to inadvertently disable the use of the indexes. On table oe, which is described in the appendix. Column id has a datatype of number. SELECT id, oe FROM oe WHERE to_char(id) = 1232
GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 7
GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 8
AUTHOR : Tina London
DATE : August 7, 1992
Time 97 secs. whereas SELECT id, oe FROM oe WHERE id = 1232 Time .4 secs. The reason for this is that oracle cannot use an index if the column is modified in the where clause. It is up to the application programmer to ensure that, WHERE clause columns, aren't modified in any way. Note that the following query will also stop the use of indexes. SELECT id, oe FROM oe WHERE id+1 = 1233 Dates in the where clause can also cause problems. To select all the records entered on a certain day the following three ideas come to mind. SELECT * FROM oe WHERE trunc(timestamp) = '26-MAR-91' SELECT * FROM oe WHERE timestamp between '26-mar-91' AND '27-mar-91' SELECT * FROM oe WHERE timestamp >= to_date('26-mar-91:00:00', 'dd-mon-yy:hh24:mi') AND timestamp < to_date('27-mar-91:00:00', 'dd-mon-yy:hh24:mi') The first one when run takes 240 seconds to produce a result. The second one only takes 1.05 seconds to
GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 8
GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 9
AUTHOR : Tina London
DATE : August 7, 1992
run, however, it has a featurette. It will include any records for midnight on the 27th of march. The last one only takes .5 of a second and it doesn't retrieve the extra records for midnight. Examples along the lines of that shown below are also considered to be column modifications. Therefore concatenating the columns, stops the indexes from being used. example a) should be rewritten as in example b) a) SELECT * FROM job WHERE db_id||job_no = 'AZ0100201' b) SELECT * FROM job WHERE db_id = 'AZ' AND job_no = '0100201' Developers need to be aware of the type conversions that Oracle performs implicitly. Oracle may choose to convert either the column or the constant. If the column is chosen then the indexes cannot be used. The conversion chosen is defined in the following table. Mixed Common Function datatype Unit chosen -------------- ----------- -------- Char with number number to_number Char with rowid rowid to_rowid Char with date date to_date The following example shows how the conversion table given above works in practice. SELECT deptno FROM dept
GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 9
GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 10
AUTHOR : Tina London
DATE : August 7, 1992
WHERE deptno = 1324 indexes non unique index on deptno The select would not use the index because the column deptno would be to_numbered by the kernel. The following queries will use the indexes. SELECT deptno FROM dept WHERE deptno = to_char(1324) SELECT deptno FROM dept WHERE deptno = '1324' 3.3 Intentionally disabling indexes. When it has been decided, that certain indexes need to be disabled, for query optimisation, the following column modifiers will be used. Datatype Inhibit expression -------- ------------------ Char char||'' number number+0 date add_months(date,0) The use of nvl(column,0) works with all datatypes, however, it could be confusing in queries which perform a lot of arithmetic calculation. See example below. SELECT deptno FROM dept WHERE nvl(deptno,0) = '1234' 3.4 Investigate Rewriting the Query SQL is a very expressive language and there are normally several ways of performing the same query. Developers should investigate different wordings of the same query, so as to identify the optimal query.
GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 10
GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 11
AUTHOR : Tina London
DATE : August 7, 1992
This can be demonstrated by the following example, which is about finding all the people who don't have jobs to do. There are 99 people in the system and 9900 jobs There are indexes on name on both tables. The first attempt SELECT p.name FROM people p, job j WHERE p.name = j.name(+) AND j.name is null fred the 27 1 record selected. Time 51.40 secs. Not a very good result this time. The second attempt SELECT name FROM people WHERE name not in
(SELECT name
FROM job) fred the 27 1 record selected. Time 6.11 secs A much better attempt, there is nearly an order of magnitude improvement. We might now be inclined to rest on our laurels, make a cup of tea and put our feet up, however, if we correlate the sub query to get. SELECT name FROM people p where not name in
(SELECT name
FROM job j WHERE p.name = j.name) fred the 27 1 record selected. Time 1.08 secs.
GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 11
GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 12
AUTHOR : Tina London
DATE : August 7, 1992
3.5 Use of the Exists operator
One operator which seems to be ignored is the EXISTS operator. This can be particularly useful in forms work for validating foreign keys. In the following example we wish to know whether 'fred the 45' has any jobs. The first example is SELECT distinct 'x' FROM job WHERE name = 'fred the 45' 1 record selected. Time 0.45 secs. The second example uses the exists operator and is almost twice as fast. SELECT 'x' FROM dual WHERE exists
( SELECT 'x'
FROM job where name = 'fred the 45') 1 record selected. Time 0.26 secs. The reason this is faster is that with the exists operator the oracle kernel knows that once it has found one match it can stop. It therefore doesn't have to continue the FTS (a TLA (Three Letter Acronym) for Full Table Scan). Things start becoming interesting, now the actual value of the data item being searched on determines which query to use. Consider the people table with 10,000 entries. Enquiring about 'fred the 34' and 'fred the 9999' gives the following. SELECT distinct 'x' FROM job WHERE name ='fred the 34' Time 6.65 secs.
GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 12
GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 13
AUTHOR : Tina London
DATE : August 7, 1992
SELECT 'x' FROM dual WHERE exists
(SELECT 'x'
FROM job WHERE name = 'fred the 34') Time 0.28 secs. SELECT 'x' FROM dual WHERE exists
(SELECT 'x'
FROM job WHERE name = 'fred the 9999') Time 8.28 secs. Ok I cheated somewhat the data goes into the table in name order. Thus 'fred the 1' goes in first data block and 'fred the 9999' goes in last one. Developers should be aware that, the efficiency of EXISTS and IN is dependant on the amount of data in each table. A query with IN in it drives from the subquery accessing the main query for each row returned, when, a query uses EXISTS it drives from the main query accessing the subquery for each row returned. So if the subquery returns few rows, but, the main query returns a lot of rows for each row from the subquery use the IN operator. 3.6 Don't perform unnecessary joins. In some parts of our system, developers are joining onto the sys_param table. This is bad practice and must be stopped. The sys_params table was designed to hold static system wide values. The correct use of this table, is for the application code to get the values it requires once at the start of the code, or in the case of regular update cycles, at the start of each such cycle. At the logical level, any runtime modification of sys_params, probably indicates
GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 13
GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 14
AUTHOR : Tina London
DATE : August 7, 1992
missing entities.
3.7 Resource intensive operations
Queries which use DISTINCT, UNION, MINUS, INTERSECT, ORDER BY and GROUP BY call upon the kernel to perform resource intensive sorts. A DISTINCT requires one sort, the other set operators require at least two sorts. Other ways of writing these queries should be found. Most queries that use the set operators, UNION,MINUS and INTERSECT, can be rewritten in other ways. 3.8 Use realistic test data. Realistic test data, which matches both in volume and values, that shown in the ER diagrams will be used. The following is based on the example used in 3.4 about rewording queries. There are now 999 people in the table and 9990 jobs. Thus we have increased our people by a factor of ten. Notice how the query performed with an outer join is much quicker than that performed by the subquery. This is a complete reversal of the result in 3.4. This demonstrates quite clearly that the sql developer must have a knowledge of both the structure of the data and the typical number of rows in each table. SELECT p.name FROM people p, job j WHERE p.name = j.name(+) AND j.name is null fred the 218 Time 23.20 secs. SELECT name FROM people WHERE not name in
(SELECT name
FROM job) fred the 218
GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 14
GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 15
AUTHOR : Tina London
DATE : August 7, 1992
Time 193.46 secs. SELECT name FROM people p WHERE not name in
(SELECT name
FROM job j WHERE p.name = j.name) fred the 218 Time 8.66 secs. 3.9 Use of != operator Unless it is absolutely necessary avoid using the != operator. The use of this operator disables the use of indexes, because the Oracle kernel assumes that the query will be retrieving most of the rows in the table. 3.10 Use Oracle's trace facility. For every query, even the simplest, check the output from oracle's trace facility. The first step in optimising queries is to eliminate FTS's, this can be done using trace. See APPENDIX F for details on the trace facility. 3.11 Management of oracle cursors. Where ever possible, oracle cursors should be declared at the start of the program. It is also good practice to tell the system the maximum number of cursors you want, at the start of the program. This can be done in DB by opening the cursors you require, in Pro*c set MAXCURSORS. There are two 'expensive' operations during the execution of a query, apart from returning the rows. These are associated with the 'parse' and 'bind' phase of query execution. What is meant by the parse phase. This is the time when the sql text that the programmer has written is translated into an internal representation, that the Oracle kernel can understand. The following activities happen during the parse phase. a) data dictionary look up and security
GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 15
GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 16
AUTHOR : Tina London
DATE : August 7, 1992
checking. This activity causes the oracle kernel to produce internal sql queries, which are called recursive queries, to check such things as whether this user is allowed access to this column etc. b) query optimisation. Part of the query optimisation which deals with views and sub queries is performed. Performing all of the above can take a considerable time. Therefore they should be done as infrequently as possible. The binding phase takes place after the parse phase. It is at this stage that the user variables are decided. For instance in the query select 'x' from person where name = :1 the host variable 1 would have its value 'bound' into the query during binding by having the address of :1 associated with the host variable. Under normal circumstances, wms programs should perform the following steps, for a query that is the be executed more than once. Step 1 During program start-up set up and parse the queries with calls to sql_set-up. Step 2 Execute the query with a call to sql_execute supplying the necessary bind variables. Step 3 Fetch and process the returned rows by using sql_fetch. To execute the query again, return to step 2 and supply new bind variables. For further information see the document entitled DB_user.doc. 3.12 The 10,15,20 percent rule Only use indexes for retrieving data from tables, where your want less than 15 percent of the rows. FTS's are quicker if you want more than 15 percent of a tables data.
GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 16
GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 17
AUTHOR : Tina London
DATE : August 7, 1992
The rule is called the 10,15,20 percent rule, because it depends on which oracle document you read, as to whether its 10,15 or 20 percent. 4.0 Insert statements. 4.1 Use of * in insert statements Although the use of the shorthand character '*' is allowed by the oracle kernel, it is considered to be bad programming practice. It is not allowed in the WMS project. The reason for it being bad practice, is that if a table is modified during an upgrade to the database, subsequent inserts could fail out in the field. 5.0 Update Statements. 5.1 Correlated updates. One of the slowest commands in SQL is the UPDATE. This is largely due to the fact that most correlated updates require a full table scan. This results in very slow performance when the table is extremely large. The following update statement is typical of correlated updates: Update Target_Table Set Target_Field = (Select Source_Information From Source_Table Where Source_Table.Key = Target_Table.Key) Where exists (Select 'x' From Source_Table Where Source_Table.Key = Target_Table.Key) Performance problems arise because there is no method of eliminating rows in the Target_Table based on information in the Source_Table. If other conditions in the Update's Where clause do not disqualify most of the rows in the Target_Table, this update will require substantial processing time.
GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 17
GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 18
AUTHOR : Tina London
DATE : August 7, 1992
The following PL/SQL code effectively uses an index on the Key field to access only the appropriate records in the Target_Table: Declare Cursor Source is Select * From Source_Table; Begin For Row in Source Loop Update Target_Table Set Target_Field = Row.Source_Information Where Key = Row.Key; End Loop; Exception When OTHERS Then Null; End; This PL/SQL script loops through each of the records in the Source_Table and updates the appropriate row in the Target_Table, if any. Essentially, this transfers the full table scan to the Source_Table and allows the index on the Target_Table to be used effectively. Running a typical correlated update on an 8,000 row table to update 9 records required 19.4 CPU seconds. The same update using the PL/SQL script executed in 1.12 CPU seconds -- a 94% performance improvement. The PL/SQL script will outperform the correlated update whenever the Source_Table is smaller than the Target_Table. The larger the Target_Table compared to the Source_Table, the more substantial the performance gain. With an effective index, the size of the Target_Table is no longer a factor in the time required to update the table; the number of records being updated determines the performance. Replacing the EXISTS subquery with IN subquery will give same improvement in most cases. 6.0 Optimising oracle queries. 6.1 Table access.
GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 18
GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 19
AUTHOR : Tina London
DATE : August 7, 1992
There are three ways that Oracle can find a row in a table, these are : 1) Scan each row in the table checking whether to select the row for output. This is commonly known as a full table scan (FTS), it is also the least efficient method of locating a row. 2) By using an index to locate the row. 3) By using the rowid to directly access the row on disc. This is the most efficient method of accessing a row. Do not use this method across transactions as it is possible for a rowid to change, use the primary key of the table in these circumstances. Unfortunately we do not normally know the rowid, so the fastest method of access is by the use of indexes. The general rules are as follows: Indexes can be used in the following circumstances. 1) If the indexed column is mentioned in the where clause. 2) If the indexed column is not modified by a function or, an arithmetic operation, or both of them. The following exception is applicable, a MIN(column), MIN(column+constant) or MAX(column), MAX(column+constant). Indexes cannot be used in the following circumstances. 1) there is no 'where' clause. 2) the indexed column is modified in any way. 3) the search is for 'null' or 'not null' 6.2 Indexes and Null. If a record has no value (i.e. NULL) in the column specified as the index key, then there will not be an index entry for the record. For this reason the following query will not use an index
GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 19
GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 20
AUTHOR : Tina London
DATE : August 7, 1992
on column COMM
SELECT * FROM emp WHERE comm is NULL Oracle assumes that the majority of records in a table will contain values for indexed columns, because of this a FTS will be executed for the following query. SELECT * FROM emp WHERE comm is not NULL A FTS may not be appropriate if the column COMM is sparsely populated. There is an example of how to alter the query to make use of the index. SELECT * FROM emp WHERE comm > -0.01 6.3 Indexes and 'NOT=' predicates. When a predicate contains a "not equal" condition, Oracle will not consider an index, however, Oracle will interpret other 'not' predicates so as to use any indexes available e.g. 'not sal > 50' => 'sal <= 50' 'not sal <= 50' => 'sal > 50' 6.4 Group by and predicate clauses. The performance of group by queries can be improved by eliminating unwanted rows early in the selection process. The following two queries return the same data, however, the second is potentially quicker, since rows will be eliminated before the set operators are applied. SELECT job, avg(sal) FROM emp GROUP BY job HAVING job = 'president' OR job = 'manager' SELECT job,
GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 20
GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 21
AUTHOR : Tina London
DATE : August 7, 1992
avg(sal) FROM emp WHERE job = 'president' OR job = 'manager' GROUP BY job 6.5 Multiple index queries. When a query has two or more equality predicate clauses, multiple indexes may be used. Oracle will merge the indexes at run time, returning rows that are in both indexes, for example. SELECT ename FROM emp WHERE deptno=20 AND job='manager' Will merge the following indexes. non unique index on job non unique index on deptno 6.6 When indexes cannot be merged Where a query can use both range and equality predicates, as in the following example, oracle cannot merge the indexes. The index on job will be used to locate the 'manager' rows, these will then be checked to ensure deptno is greater than 10. SELECT * FROM emp WHERE job='manager' AND deptno > 10 indexes: non unique index on job non unique index on deptno When there is no clear preference as to which index to use, see the following query, Oracle will use only one index, because it is inefficient to merge them. Note that a scan of the second table, will have to be done for each row returned in the first table. Since both indexes are non unique, Oracle will choose the index which, it encounters first
GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 21
GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 22
AUTHOR : Tina London
DATE : August 7, 1992
in the DC_INDEXES cache. In practice, as developers cannot see the cache, the choice is arbitrary. SELECT ename FROM emp WHERE sal > 1 AND empno > 1 indexes : non unique index on empno non unique index on sal 6.7 Suppression of indexes for performance When there is a choice between merging a unique index with a non unique one, the kernel will always use the unique index and avoid performing the merge. The reasoning behind this is that the unique index will return only one row. See the following example. SELECT ename FROM emp WHERE sal = 3000 AND empno = 7902 indexes : unique index on empno non unique index on sal Only the empno index will be used, if a record is found then the row will be checked to see if the sal = 3000. Oracle will only use 5 indexes in a single sql statement. After 5 indexes have been merged, the kernel will check those rows returned for the values in the remaining predicates. SELECT * FROM emp WHERE empno =7844 AND job = 'salesman' AND deptno = 30 AND sal = 1500 AND comm = 0 AND ename = 'turner'
GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 22
GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 23
AUTHOR : Tina London
DATE : August 7, 1992
indexes : non unique index on empno non unique index on job non unique index on deptno non unique index on sal non unique index on comm non unique index on ename Because all the predicates score equally, according to APPENDIX A, only five of the above indexes will be used. Rows that are returned will be checked by the kernel to see if the last value is correct. 6.8 Concatenated indexes Concatenated indexes are indexes that reference more than one column. The index will be available for scoring, provided that the first part of the index is present and usable. In the following examples assume a concatenated index on job and deptno. SELECT * FROM emp WHERE job = 'president' AND deptno = 10 The full index will be used. SELECT * FROM emp WHERE deptno = 10 AND job = 'president' The full index will be used. SELECT * FROM emp WHERE deptno = 10 The index will not be used, the leading part of the index is missing from the statement. SELECT * FROM emp WHERE job = 'analyst' The leading part of the index will be used, developers should check the selectivity of this
GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 23
GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 24
AUTHOR : Tina London
DATE : August 7, 1992
part of the index, it may not be selective, and needs to be suppressed. SELECT * FROM emp WHERE job != 'clerk' AND deptno = 10 The index cannot be used because the != prevents the use of the leading part of the index. 6.9 Or optimisation Unlike the 'AND' operator which requires that a row passes both predicates, the 'OR' operator requires a row to pass either of the predicates. See the following example SELECT ename, sal, job FROM emp WHERE sal = 3000 OR job = 'clerk' index : non unique on job If the job index is used to identify the rows, then it would only return those employees who are 'clerks', it would not return those employees where the sal is 3000 and job is not 'clerk'. A single concatenated index on job, sal or sal,job cannot be used because records with values for the second key, must be found regardless of the value of the first key. If we have an index on sal,job, then with the index on job the kernel can optimise the query. A single column index on sal could also be used. If there are indexes available for both the predicates that are or'd, then the query will be processed conceptually, as the union of two select statements. This is shown in the example below. SELECT ename, sal, job FROM emp
GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 24
GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 25
AUTHOR : Tina London
DATE : August 7, 1992
WHERE sal = 3000 OR job = 'clerk' indexes : non unique on job non unique on sal Becomes : SELECT ename, sal, job FROM emp WHERE job = 'clerk' UNION SELECT ename, sal, job FROM emp WHERE sal = 3000 AND job != 'clerk' Notice that the kernel has used the last predicate in the or clause for the single condition query, and it has used the same predicate to form the "!=" condition in the double condition query. Notice that if we rewrite the query as SELECT ename, sal, job FROM emp WHERE job = 'clerk' OR sal = 3000 indexes : non unique on job non unique on sal Becomes : SELECT ename, sal, job
GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 25
GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 26
AUTHOR : Tina London
DATE : August 7, 1992
FROM emp WHERE sal = 3000 UNION SELECT ename, sal, job FROM emp WHERE job = 'clerk' AND sal != 3000 From this, we can see that it is best to place the predicate associated with the most selective index first in the where clause, and the least selective last. This minimises the number of checks for '!='. These 'or' optimisations cannot take place when the sql query contains a connect by, or an outer join. 6.10 Non correlated sub queries There are two cases to consider here, firstly queries which use the IN operator, and secondly, queries which use the NOT IN operator. We will deal with the IN operator first. The following rules apply to optimising the query. 1) The main and subqueries are optimised separately. 2) The same rules for optimisation apply to the main and subqueries, for example, in the query below there is no optimisation of the subquery because there is no where clause. 3) The driving table is the table from the subquery. In the example below this will be job. 4) The subquery is transformed into a join by the following method. The rows frturned from the subquery are sorted and duplicates removed. A full table scan of these sorted rows is used for access, finally the table in the main query is joined to the sorted rows from the subquery using column in the main query's where clause.
GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 26
GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 27
AUTHOR : Tina London
DATE : August 7, 1992
SELECT distinct name FROM men WHERE pin in
(SELECT pin
FROM job) indexes : gives the following explain plan. SORT(UNIQUE) MERGE JOIN TABLE ACCESS (FULL) OF 'MEN' SORT(JOIN) TABLE ACCESS (FULL) OF 'JOB' Execute time 4759 Note that the SORT(UNIQUE) is caused by the distinct. If we now put a dummy where clause on the sub query we get SELECT distinct name FROM men WHERE pin in
(SELECT pin
FROM job ) indexes : unique on job(jobno) non unique on job(pin) gives the following explain plan SORT(UNIQUE) NESTED LOOPS TABLE ACCESS (FULL) OF 'MEN' INDEX(RANGE SCAN) OF 'JOB_1' (NON_UNIQUE) Note that an index has been used on job. This is the one case where an index can be used even though there is no where clause on the subquery. The sql is executed conceptually as, SELECT distinct pin FROM job ORDER BY pin, these sorted rows are then joined with the rows from men using the index job(pin).
GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 27
GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 28
AUTHOR : Tina London
DATE : August 7, 1992
The use of NOT IN is not recommended, developers should consider rewording queries using this construct to use outer joins, however, it should be noted that outer joins are an oracle extension, and are not available on other RDBMS's. See below for example. SELECT * FROM dept WHERE deptno not in
(SELECT deptno
FROM emp) This can be rewritten as the following outer join. SELECT d.* FROM dept d,emp e WHERE d.deptno = e.deptno(+) AND e.rowid is NULL This relies on the fact that each row has a unique rowid, and that, rowid is never NULL. Note also that any NOT IN is equivelent to a NOT EXISTS. Therefoe any NOT EXISTS can be transformed into an outer join. 6.11 Correlated sub queries All Correlated subqueries follow the same execution path. The main and the subqueries are separately optimised. The driving table in selected from the main query. For each row returned in the main query, the subquery is executed. Indexes can be used for the where clauses on both the main and subquery.
GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 28
GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 29
AUTHOR : Tina London
DATE : August 7, 1992
APPENDIX A Query Paths Ranked in Order of Speed The lower the rank, the faster the path. Rank Path ====== ================================== 1 ROWID = constant 2 Unique indexed column = constant 3 entire unique concatenated index = constant 4 entire cluster key = corresponding cluster key in another table in the same cluster 5 entire cluster key = constant 6 entire non-unique concatenated index = constant 7 non-unique single column index merge 8 most leading concatenated index = constant 9 indexed column BETWEEN low value AND high value, or indexed column LIKE 'C%' (bounded range) 10 sort/merge (joins only) 11 MAX or MIN of single indexed column 12 ORDER BY entire index 13 full table scans 14 unindexed column = constant, or column IS NULL, or column LIKE '%C%' (full table scan)
GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 29
GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 30
AUTHOR : Tina London
DATE : August 7, 1992
GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 30
GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 31
AUTHOR : Tina London
DATE : August 7, 1992
APPENDIX B Table Structure for demos table oe (id number(6,0), oe number(1,0), timestamp(date)) The field id holds the unique reference number starting at 1 the field oe holds a 0 if the id is even and a 1 if it is odd unique index on id nonunique index on oe there are 100,000 rows in oe table oe_link (oe number(1),name char(10) with 2 records table oe_link_name (name char(10),age number(3)) With 1 record people (name, sex) job(job_id,name)
GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 31
GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 32
AUTHOR : Tina London
DATE : August 7, 1992
APPENDIX C Update Performance RDBMS John R. Pack VAX/VMS July 17, 1990 V6.0.30.4 (Revised 19-Sep-90) Using PL/SQL to Enhance Update Performance One of the slowest commands in SQL is the UPDATE. Most often, when a client complains about performance, a large, correlated update (or,worse, an uncorrelated update) is at the root of the problem.Often, even after creating the optimal indexes and fine tuning the SQL statement itself, the update is still hogging enormous CPU resources and is still the bottleneck in the user's production scheme. The Correlated Update This is largely due to the fact that most correlated updates require a full table scan. This results in very slow performance when the table is extremely large. The following update statement is typical of correlated updates: Update Target_Table Set Target_Field = (Select Source_Information From Source_Table Where Source_Table.Key = Target_Table.Key) Where exists (Select 'x' From Source_Table Where Source_Table.Key = Target_Table.Key) Performance problems arise because there is no method of eliminating rows in the Target_Table based on information in the Source_Table. If other conditions in the Update's Where clause do not disqualify most of the rows in the Target_Table, this update will require substantial processing time.
GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 32
GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 33
AUTHOR : Tina London
DATE : August 7, 1992
In addition, this type of query suffers because an index would not help the Target_Table access time. In fact, the only index which will improve the performance of the preceding update is an index on the Source_Table Key field. If the Source_Table is large, this is very useful. Nevertheless, the best method of updating the table would be able to access just the correct rows in the Target_Table using an effective index. This method is now available using PL/SQL and an index on the Target_Table Key field. The PL/SQL Update The following PL/SQL code effectively uses an index on the Key field to access only the appropriate records in the Target_Table: Declare Cursor Source is Select * From Source_Table; Begin For Row in Source Loop Update Target_Table Set Target_Field = Row.Source_Information Where Key = Row.Key; End Loop; Exception When OTHERS Then Null; End; This PL/SQL script loops through each of the records in the Source_Table and updates the appropriate row in the Target_Table, if any. Essentially, this transfers the full table scan to the Source_Table and allows the index on the Target_Table to be used effectively. Performance Gain Running a typical correlated update on an 8,000 row table to update 9 records required 19.4 CPU seconds. The same update using the PL/SQL script executed in 1.12 CPU seconds -- a 94% performance improvement. The PL/SQL script will outperform the correlated update whenever the Source_Table is smaller than the
GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 33
GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 34
AUTHOR : Tina London
DATE : August 7, 1992
Target_Table. The larger the Target_Table compared to the Source_Table, the more substantial the performance gain. With an effective index, the size of the Target_Table is no longer a factor in the time required to update the table; the number of records being updated determines the performance. Replacing the EXISTS subquery with IN subquery will give same improvement in most cases.
GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 34
GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 35
AUTHOR : Tina London
DATE : August 7, 1992
APPENDIX D GUIDELINES FOR USING THE OUTER JOIN SYNTAX Guidelines for Using the Outer Join Syntax RDBMS RDBMS Support V6.0 October 1990 1 INTRODUCTION The purpose of this technical note is to provide some guidelines on how to use the outer join facility provided by ORACLE. 1.1 Outer Join Semantics - Definitions The following terms, used to describe the operation on outer joins, are defined :- 'outer-join column' - a column reference followed by the symbol (+), e.g. EMPNO(+) and DEPT.DEPTNO(+) are outer join columns 'simple predicate' - a logical expression containing no AND's, OR's, or NOT's ( usually a simple relation such as A = B ) 'outer join predicate' - a simple predicate containing one or more outer join columns. 2 OUTER JOIN SYNTAX - RULES An outer join predicate may only contain outer join columns from one table ( in other words, all outer join columns in a single outer join predicate must belong to the same table). This means, for example, that the following statement is illegal :- EMP.EMPNO(+) = DEPT.DEPTNO(+) - outer join columns from two tables
GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 35
GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 36
AUTHOR : Tina London
DATE : August 7, 1992
Also, if a column in a predicate is an outer join column, then all columns from the same table must be outer join columns in that predicate. This means, for example, that the following statement is illegal :- EMP.SAL + EMP.COMM(+) = SALGRADE.HIGH - mixed columns from one table In a predicate, the table referenced with a (+) is directly 'outer joined' to all other tables in the predicate. It is indirectly 'outer joined' to any tables to which these other tables are themselves 'outer joined'. A predicate may not be directly or indirectly 'outer joined' to itself. This means, for example, that the following combination of predicates is illegal :- EMP.EMPNO(+) = PERS.EMPNO AND PERS.DEPTNO(+) = DEPT.DEPTNO AND DEPT.JOB(+) = EMP.JOB - circular outer join relationship 3 OUTER JOIN EXECUTION For a given table, T, there may be both outer join and non-outer join predicates. Execution occurs ( conceptually ) as follows :- 1. The result of joining all tables mentioned in table T's outer join predicates is formed ( by recursive application of this algorithm ). 2. For each row of the result, a set of composite rows is formed, each consisting of the original row in the result joined to a row in table T for which the composite row satisfies all of table T's outer join predicates. 3. If a set of composite rows is the null set, a composite row is created consisting of the original row in the result joined to a row similar to those in table T, but with all values set to null. 4. Rows that do not pass the non-outer join predicates are removed. This may be summarised as follows. Outer join predicates ( those with (+) after a column of table T ), are evaluated BEFORE table T is augmented with a null row. The null
GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 36
GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 37
AUTHOR : Tina London
DATE : August 7, 1992
row is added only if there are NO rows in table T that satisfy the outer join predicates. Non-outer join predicates are evaluated AFTER table T is augmented with a null row (if needed) 4 OUTER JOIN - RECOMMENDATIONS Certain types of outer joins in complicated logical expressions may not be well formed. In general, outer join columns in predicates that are branches of an OR should be avoided. Inconsistencies between the branches of the OR can result in an ambiguous query, and this may not be detected. It is best to confine outer join columns to the top level of the 'where' clause, or to nested AND's only. 5 OUTER JOIN - ILLUSTRATIVE EXAMPLES 5.1 Simple Outer Join SELECT ENAME, LOC FROM DEPT, EMP WHERE DEPT.DEPTNO = EMP.DEPTNO(+) The predicate is evaluated BEFORE null augmentation. If there is a DEPT row for which there are no EMP rows, then a null EMP row is concatenated to the DEPT row. 5.2 Outer Join With Simple Post-Join Predicates SELECT ENAME, LOC FROM DEPT, EMP WHERE DEPT.DEPTNO = EMP.DEPTNO(+) AND EMP.DEPTNO IS NULL The second simple predicate is evaluated AFTER null augmentation, since there is no (+), removing rows which were not the result of null augmentation and hence leaving only DEPT rows for which there was no corresponding EMP row. 5.3 Outer Join With Additional Pre-Join Predicates SELECT ENAME, LOC FROM DEPT, EMP WHERE DEPT.DEPTNO = EMP.DEPTNO(+) AND 'CLERK' = EMP.JOB(+) AND EMP.DEPTNO IS NULL
GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 37
GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 38
AUTHOR : Tina London
DATE : August 7, 1992
The predicate on EMP.JOB is evaluated at the same time as the one on EMP.DEPTNO - before null augmentation. As a result, a null row is augmented to any DEPT row for which there are no corresponding clerks's in the EMP table. Therefore, this query displays departments containing no clerks. Note that it the (+) were omitted from the EMP.JOB predicate, no rows would be returned. In this case, both the EMP.JOB and EMP.DEPTNO IS NULL predicates are evaluated AFTER the outer join, and there can be no rows for which both are true.
GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 38
GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 39
AUTHOR : Tina London
DATE : August 7, 1992
APPENDIX E The Ten commandments for fast Queries THOU SHALL ACCIDENTLY NOT DISABLE THE USE OF INDEXES BY MODIFYING A COLUMN WITHIN THE WHERE CLAUSE THOU SHALL PUT THE TABLE THAT RETURNS THE FEWEST ROWS LAST IN THE FROM LIST OF THE QUERY. THOU WILL INVESTIGATE REWORDING THY QUERY IN SOME OTHER WAY. THOU SHALL USE THE EXISTS OPERATOR WHENEVER POSSIBLE. THOU SHALL NOT ASK THE KERNEL FOR MORE THAN THOU WANTEST. THOU SHALL NOT FORCE REPARSING OF A QUERY WHEN ONLY CHANGING BIND VARIABLES THOU SHALL KNOW THE DATA VOLUMETRICS AND USE REALISTIC TEST DATA. THOU SHALL USE TABLE ALIASES IN THY QUERIES THOU SHALL NOT USE THE != UNLESS ABSOLUTELY NECESSARY THOU SHALL USE THE TRACE FACILITY TO MONITOR YOUR QUERIES
GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 39
GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 40
AUTHOR : Tina London
DATE : August 7, 1992
APPENDIX F EXPLAIN Facility
MOTIVATION There is a need for users to be able to determine the steps the system takes in performing various operations on a database. The EXPLAIN facility provides users with a convenient mechanism for getting this information. The facility stores this information in a standard database table that can be manipulated with standard SQL statements.
SYNTAX The following syntax is based on the syntax used by DB2 for their EXPLAIN facility:
EXPLAIN PLAN [SET STATEMENT_ID [=] <string literal>] [INTO <table_name>] FOR <sql_statement>
where
STATEMENT_ID - a unique optional identifier for the statement;
INTO - allows user to save the results of the analysis in the specified table. The table must conform to the format for the table used to store the analysis ( see TABLE FORMATS section for a description of the table format ). If this clause is not specified, the system will then attempt to store the information in a table named <user_id>.PLAN_TABLE . If the explicit or implicit table does not exist the EXPLAIN command will fail. <sql statement> - an insert, delete, update, or query statement;
TABLE FORMATS Core Table Format
The core table used to represent the plan information consists of the following fields:
STATEMENT_ID - An identifier associated with the statement. If not set by
the user, the identifier will be NULL. Note that a user may identify a statement by the timestamp field.
GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 40
GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 41
AUTHOR : Tina London
DATE : August 7, 1992
TIMESTAMP - The date and time when the statement was analysed.
REMARKS - Any comment the user wishes to associate with this step of
the analysis.
OPERATION - the name of the operation being performed. The following table
provides a listing of the operations described by the facility.
Operation Description --------------------------------------------------------------- And-Equal A retrieval utilising intersection of rowids from index searches Connect by A retrieval that is based on a tree walk Concatenation A retrieval from a group of tables. It is essentially a UNION ALL operation of the sources. Used for OR operations. Counting A node that is used to count the number of rows returned from a table. Used for queries that use the ROWNUM meta-column. Filter A restriction of the rows returned from a table First Row A retrieval of only the first row For Update A retrieval that is used for updating Index A retrieval from an index Intersection A retrieval of rows common to two tables Merge Join A join utilising merge scans Minus A retrieval of rows in Source 1 table but not in Source 2 table Nested Loops A join utilising nested loops. Each value in the first subnode is looked up in the second subnode. This is often used when one table in a join is indexed and the other is not. Project A retrieval of a subset of columns from a table Remote A retrieval from a database other than the current database Sequence An operation involving a sequence table Sort A retrieval of rows ordered on some column or group of columns Table A retrieval from a base table Union A retrieval of unique rows from two tables View A retrieval from a virtual table -------------------------------------------------------------------
Note that the operation shown when counting the number of rows returned by a query (i.e. select count(*)) is SORT. This is due to the way that COUNT is implemented internally. The table will not really be sorted.
OPTIONS - an option that modifies the operation, e.g.,
OUTER option on join operations, rationale for sorting, type of index scan, type of filter, etc. The following table
GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 41
GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 42
AUTHOR : Tina London
DATE : August 7, 1992
provides a list of the options for the operations that have options. OPERATION OPTIONS DESCRIPTION --------------------------------------------------------------------- Index UNIQUE KEY Unique key lookup on index RANGE Index range scan Merge Join OUTER Join is an outer join Nested Loops OUTER Join is an outer join Sort DISTINCT Sort is to produce distinct values GROUP BY Sort is for grouping operation JOIN Sort is for merge join ORDER BY Sort is for order by Table BY ROWID Table scan is by rowid FULL Sequential table scan CLUSTER Table scan by cluster key ---------------------------------------------------------------------
OBJECT_NODE - the name of the node that owns the database object.
OBJECT_OWNER - the name of the schema the owns the database object.
OBJECT_NAME - the name of the database object.
OBJECT_TYPE - a modifier that provides descriptive information about
the database object, e.g., NON-UNIQUE for indexes, etc.
OBJECT_INSTANCE - a number corresponding to the ordinal position of the
object as it appears in the original query. The numbering proceeds from left to right, outer to inner with respect to the original query text. Note that at this level, view expansion will result in rather interesting object instance numbers. We will be addressing this issue fully in future releases.
SEARCH_COLUMNS - the number of leading columns used when searching an
index.
ID - a number assigned to this operation in the tree.
Corresponds to a preorder traversal of the row source tree.
PARENT_ID - the number assigned to the previous operation that
receives information from this operation. This field
combined with the
ID field allows users to do a treewalk of the specified
plan with the CONNECT BY statement.
POSITION - the position this database object occupies for the previous
operation.
OTHER - other information that is specific to the row source that a user
may find useful. For example, the select statement to a remote node, etc.
GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 42
GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 43
AUTHOR : Tina London
DATE : August 7, 1992
Sample Table Definition
create table PLAN_TABLE (
statement_id char(30), timestamp date, remarks char(80), operation char(30), options char(30), object_node char(30), object_owner char(30), object_name char(30), object_instance numeric, object_type char(30), search_columns numeric, id numeric, parent_id numeric, position numeric, other long);
An SQL script to create this table resides in file xplainpl.sql in the same directory containing the file catalog.sql. This table must reside in the current schema unless you use the optional INTO clause of the EXPLAIN command.
EXAMPLES Suppose we issue the following statements:
EXPLAIN PLAN SET STATEMENT_ID = 'query1' INTO QUERY_PLANS FOR SELECT * FROM T1,T2,T3 WHERE T1.F1 = T2.F1 AND T2.F2 = T3.F2; SELECT OPERATION, OPTIONS, OBJECT_NAME, ID, PARENT_ID, POSITION FROM QUERY_PLANS WHERE STATEMENT_ID = 'query1' ORDER BY ID;
The following output would be created:
OPERATION OPTIONS OBJECT_NAME ID PARENT_ID POSITION
MERGE JOIN 1 MERGE JOIN 2 1 1 SORT JOIN 3 2 1 TABLE ACCESS FULL T1 4 3 1 SORT JOIN 5 2 2 TABLE ACCESS FULL T2 6 5 1 SORT JOIN 7 1 1
GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 43
GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 44
AUTHOR : Tina London
DATE : August 7, 1992
TABLE ACCESS FULL T3 8 7 1
8 RECORDS selected
Suppose that an index is created on field F1 on table T1 and the following statements are issued:
EXPLAIN PLAN SET STATEMENT_ID = 'query2' INTO QUERY_PLANS FOR SELECT * FROM T1 WHERE F1 > 1; SELECT OPERATION, OPTIONS, OBJECT_NAME, OBJECT_TYPE, SEARCH_COLUMNS, ID, PREVIOUS_ID FROM QUERY_PLANS WHERE STATEMENT_ID='query2' ORDER BY ID;
The following output is produced:
OPERATION OPTIONS OBJECT_NAME OBJECT_TYPE SEARCH_COLUMNS ID POSITION
TABLE SCAN BY ROWID T1 1 INDEX SCAN RANGE IT1 NON-UNIQUE 1 2 1
2 RECORDS selected
With the same conditions as before, suppose we issue the following which demonstrates an index only retrieval:
EXPLAIN PLAN SET STATEMENT_ID = 'query3' INTO QUERY_PLANS FOR SELECT F1 FROM T1 WHERE F1 > 1; SELECT OPERATION, OPTIONS, OBJECT_NAME, OBJECT_TYPE, SEARCH_COLUMNS, ID FROM QUERY_PLANS WHERE STATEMENT_ID='query3';
The following output is produced:
OPERATION OPTIONS OBJECT_NAME OBJECT_TYPE SEARCH_COLUMNS ID
INDEX SCAN RANGE IT1 NON-UNIQUE 1 1 RECORDS selected
The next example illustrates the output if a grouping operation is
GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 44
GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 45
AUTHOR : Tina London
DATE : August 7, 1992
specified in the statement:
EXPLAIN PLAN SET STATEMENT_ID = 'query4' INTO QUERY_PLANS FOR SELECT AVG(F1),F2 FROM T1 GROUP BY F2; SELECT OPERATION, OPTIONS, OBJECT_NAME, ID, PARENT_ID FROM QUERY_PLANS WHERE STATEMENT_ID = 'query4' ORDER BY ID; OPERATION OPTIONS OBJECT_NAME ID PARENT_ID --------------------------------------------------------------------- SORT GROUP BY 1 TABLE SCAN FULL T1 2 1
2 RECORDS selected
The next example illustrates the ouptut if DISTINCT is specified in the statement:
EXPLAIN PLAN SET STATEMENT_ID = 'query5' INTO QUERY_PLANS FOR SELECT DISTINCT F1 FROM T1; SELECT OPERATION, OPTIONS, OBJECT_NAME, ID, PREVIOUS_ID FROM QUERY_PLANS WHERE STATEMENT_ID = 'query5' ORDER BY ID; OPERATION OPTIONS OBJECT_NAME ID PREVIOUS_ID -------------------------------------------------------------- SORT DISTINCT 1 TABLE SCAN FULL T1 2 1
2 RECORDS selected
The next example illustrates the output if a subquery is specified in the statement:
EXPLAIN PLAN SET STATEMENT_ID = 'query6' INTO QUERY_PLANS FOR SELECT * FROM T1 WHERE F1 < ( SELECT F2 FROM T2 WHERE F1=3); SELECT OPERATION, OPTIONS, OBJECT_NAME, ID, PARENT_ID, POSITION FROM QUERY_PLANS WHERE STATEMENT_ID = 'query6' ORDER BY ID;
GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 45
GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 46
AUTHOR : Tina London
DATE : August 7, 1992
OPERATION OPTIONS OBJECT_NAME ID PARENT_ID POSITION
FILTER OUT 1 TABLE SCAN FULL T1 2 1 1 TABLE SCAN FULL T2 3 1 2
3 RECORDS selected
The final example displays a complex query whose output is sent to the default plan table. ( It is assumed that this table has been created before issuing the statement.)
EXPLAIN PLAN SET STATEMENT_ID = 'query7' FOR SELECT * FROM T1,T2 WHERE T1.F1 = T2.F1 UNION SELECT * FROM T2,T3 WHERE T2.F1 = T3.F1; SELECT OPERATION, OPTIONS, OBJECT_NAME, ID, PARENT_ID, POSITION FROM PLAN_TABLE WHERE STATEMENT_ID = 'query7' ORDER BY ID;
The following output is produced:
OPERATION OPTIONS OBJECT_NAME ID PARENT_ID POSITION
PROJECTION 1 UNION 2 1 1 SORT DISTINCT 3 2 1 NEST LOOP 4 3 1 TABLE SCAN BY ROWID T1 5 4 1 INDEX SCAN RANGE IT1 6 5 1 TABLE SCAN FULL T2 7 4 2 SORT DISTINCT 8 2 2 MERGE JOIN 9 8 1 SORT JOIN 10 9 1 TABLE SCAN FULL T2 11 10 1 SORT JOIN 12 9 2 TABLE SCAN FULL T3 13 12 1
13 RECORDS selected
The following example is based on the previous query. It illustrates the use of the treewalking capability in Oracle's version of SQL.
SELECT LPAD(' ',2*LEVEL)||OPERATION,OPTIONS, OBJECT_NAME FROM PLAN_TABLE WHERE STATEMENT_ID='query7' CONNECT BY PRIOR ID = PARENT_ID and STATEMENT_ID = 'query7' START WITH ID = 1 and STATEMENT_ID = 'query7' ORDER BY ID;
GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 46
GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 47
AUTHOR : Tina London
DATE : August 7, 1992
LPAD(' ',2*LEVEL)||OPERATION
OPTIONS OBJECT_NAME ------------------------------------- PROJECTION UNION SORT DISTINCT NEST LOOP TABLE SCAN BY ROWID T1 INDEX SCAN RANGE IT1 TABLE SCAN FULL T2 SORT DISTINCT MERGE JOIN SORT JOIN TABLE SCAN FULL T2 SORT JOIN TABLE SCAN FULL T3
13 RECORDS selected
GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 47
GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 48
AUTHOR : Tina London
DATE : August 7, 1992
APPENDIX G TRACING IN THE WMS DEVELOPMENT ENVIRONMENT The /wmsnew/bin/sql script creates two files LASTRUN.LST and LASTRUN.ENV in the current directory and logs the user into sqlplus. The TWO_TASK, ORACLE_SID and SQL_PASSWD environment variables must be set before calling /wmsnew/bin/sql to determine the database account to be accessed. Two files ( LASTRUN.LST and LASTRUN.ENV ) are created by sql in the current directory and are are used by explain to identify the session to be explained. Notes : Since user dump files are normally removed on exit from sqlplus the explain should be run whilst the sql session is active. explain requires the LASTRUN.LST and LASTRUN.ENV files so it should be run on another window IN THE SAME DIRECTORY that the sql call was made in. If the PLAN_TABLE does not exist on the account specified then sql will create it on the TEMP1 tablespace. When /wmsnew/bin/sql is called sql you should see something like the following : SQL_PASSWD is wms6/hoddengrey ORACLE_SID is oracle ORACLE_HOME is /usr/oracle6 TWO_TASK is T:diamond:oracle TTY is ttyp0
SQL*Plus: Version 3.0.9.1.2 - Production on Mon Aug 10 11:41:10 1992
Copyright (c) Oracle Corporation 1979, 1989. All rights reserved.
Connected to:
ORACLE RDBMS V6.0.33.1.4, transaction processing option - Production
GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 48
GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 49
AUTHOR : Tina London
DATE : August 7, 1992
PL/SQL V1.0.32.3.1 - Production
Session altered.
'PROGRAM|'||PID||'|'||PROGRAM
PROGRAM|29|sqlplus_at_onyx (TCP Two-Task) orapid:19422
SQL>
You can now use explain to see the execution plan of any of your queries. Either have a second window in the same directory and use %explain, or use !explain from the sql prompt.
GUIDELINES AND GOOD PRACTICE FOR DEVELOPING SQL Page 49
-- tina london -- David T. Bath | Email:dtb_at_otto.bf.rmit.oz.au (131.170.40.10) Senior Tech Consultant | Phone: +61 3 347-7511 TZ=AEST-10AEDST-11 Global Technology Group | 179 Grattan St, Carlton, Vic, 3153, AUSTRALIA "The robber of your free will does not exist" - EpictetusReceived on Tue Oct 11 1994 - 15:51:49 CET