Home » SQL & PL/SQL » SQL & PL/SQL » Performance query on where clause whether to provide the argument in begining or at ending statement
Performance query on where clause whether to provide the argument in begining or at ending statement [message #316822] |
Mon, 28 April 2008 08:24 |
prashas_d
Messages: 66 Registered: February 2007
|
Member |
|
|
Hi All,
I dont have much knowledge on SQL performance tuning. I need to write a select query by joining two or more huge tables where performance is very much considered.
Please let me know which is the best way to write a query that produce the best performance:
----------------------------------
select * from
employee e, department d
where e.employee_id = 132
and e.department_id = d.department_id;
(or)
select * from
employee e, department d
where e.department_id = d.department_id
and e.employee_id = 132;
---------------------------------------
If there are 3 tables like employee, employer and department. If the table employee is the hugest table followed by employer and department table then which is the best way to write a query?
-------------------------------------
select * from
employee ee, department d, employer er
where ee.department_id = d.department_id
and ee.department = er.department;
(or)
select * from
employee ee, department d, employer er
where d.department_id = ee.department_id
and d.department = er.department;
------------------------------------------
Thanks in advance,
prashas_d.
|
|
|
|
|
Re: Performance query on where clause whether to provide the argument in begining or at ending state [message #316959 is a reply to message #316822] |
Tue, 29 April 2008 00:08 |
harrysmall3
Messages: 109 Registered: April 2008 Location: Massachusetts
|
Senior Member |
|
|
Good question prashas_d - and first let me restate anacedent's wisdom - specify the fields you want, i have a feeling you dont want all of them from all tables.
Thinking in terms of order of events is good a practice, as /*+ ordered */ and /*+ leading */ have very good purpose in making sort join steps efficient and getting stuck in a nested loop on the wrong table is death.
However lets keep this simple because what you ask for is simple. I have a sneaky feeling there may be some performance probs here if you have an index on emp_id and this query is doing a nested loop which is a "quadratic" function (as the records in a dataset go up, the time to process goes up exponentially).
In the world of large datasets which i have been stuck in for quite some time I very very rarely index on a join field.
The thing that sticks out to me is employee id 1321 - a single employee.
I'd probably write something like this -
select /*+ use_hash(empfld d) */ empfld.fields_i_want,d.fields_i_want from
(select e.fields_i_want from employee where employee_id = 1321) empfld,
department d
Where d.department_id = empfld.department_id;
my rationale being - #1 get me a small list from the employee table and use the employee_id index if it exists.
#2 hash join the small set with the department table.
If an index exists on department_id (probably) on departments, we dont want to invoke a nested loop that scans that index for every employee row. we may have to "trash it" so to speak with a no_index hint or wrap some null function around it like
'Where nvl(d.department_id,d.department_id,null) = empfld.department_id.
My whole obsession here is to skip (do the explain plan to find out) any nested_loops which indexes invoke on joins.
probably i'd start with just :
select /*+ use_hash(e d) leading(e) ordered*/ the_query_fields from e, d where
e.emp_id = 132 and
e.dept_id d.dept_id
my focus is always - what do i want oracle to do. what i am attempting here without some stats is to make oracle:
#1 - get me the employee records for id 132 and join that result set to the department table.
#2 - use the employee id index for a little boost on employee table
#3 - use a hash join against the department table with a small subset of data
#4 - avoid recusive scans of the department table index.
the order i want is employee table first because i want all id 132's upfront. I do not want to join the two tables and then widdle down.
SO - after that tirade; i say order tables and where clauses in the order that will widdle things down the best. which means 'smallest quickest'.
and that my friend isnt about size of table - its about 'what criteria is specific to an individual table in a query vs criteria that joins the table' - a big table can produce a small result set for a join...compare result set to result set -
employee is prob the big guy - but employee 132 data is prob the small result set.
divide your criteria into those two categories - what criteria pertains to a single table vs the join of tables. make small result sets and join them. in this case my example i believe is called a "right field" query - a result set joined to a driver table.
if you had criteria pertaining specific to departments, say d.dept_something_type = 'A' or 'B' - i'd probably make a result set of that as departments
(select fields from department where ...) as dept
and join via a hash_join empflds and dept.
So - order by size of result sets and avoid nested loops like the plague unless you are dealing with an amount of records you can write on your hand!
g'luck and best wishes!
-harry
|
|
|
|
Re: Performance query on where clause whether to provide the argument in begining or at ending state [message #317179 is a reply to message #317039] |
Tue, 29 April 2008 19:10 |
harrysmall3
Messages: 109 Registered: April 2008 Location: Massachusetts
|
Senior Member |
|
|
Hey JRowBottom -
i seem to always be in two worlds - 1) where Oracle does find the correct access paths all the time after its optimization and 2) it wants to kill me and my queries with a vengence! I'm on 10g and right now trying to learn more about the distinction between the generation of cost and the actual decision for oracle access methods. it appears that "some unforseen force of the universe" causes oracle to alter its decision making at timed.
am i correct in saying that the optimizer is making an estimate at the duration of time it will take to find the correct access paths, and that optimizer hints per se direct oracle to do this estimation using different categories of algorithms - like with use_hash, the optimization is taking the query itself as data, making base hash tree structures and calculating how long its available algorithms in that group will probably take to determine the best access paths (ie the plan isnt the end all to the access methods just an estimation of best route to take based on calculation time it perceives)
then it determines the cost based on the longest estimated calculation time to find (which it doesnt do until it run) the access paths? now whether or not they are best will be determined at that time - and that is often based i think on whether i am at the terminal or someone else!
back to your question because perhaps i am more cursed than originally thought (company got bought out, i'm on a new server with a new dba crew - a powerful box, i have like a couple gig in the pga which makes hash join addicts like myself happy, and unfortunately a system that will always choose a nested-loop approach to any query where a table has an index because the cost 'estimation' always results lowest. throw an index anywhere and it wants to devour it like a rabid starving dog!
are you saying on your 9i environment you cannot effect a plan difference in a multi table join query by forcing different LEADING(table) tables via /*+leading(tbl) */? If so are they hiring where you are! lol...
i'm at home right now and am going to try to dig up the last 'mosnter' query i had to tame for a biz user where the final tweak was the leading hint - in my cases the table is always a sub-select result set, so i guess i'm always trying to dictate "what sets if joined first would widdle down the next outer join's sort time or join size".
I'll def dig up some examps for you. The more i think about it -i havent really changed the order of how i write the 'where' when i want to change the order of listed tables (or sub-selects), i rely on the 'leading' hint to change it.
perhaps the /*+ ordered */ hint for order of where clause criteria is constant now (such that i cant alter the path by solely changing the order of the clause criteria) - and until you posted this, i really hadnt given much thought on the distinguishment (order of where clause vs order of table processing). Though that makes me think - if Oracle re-arranges the where criteria will the ordered hint even work (even if its wrong based on original order) or is it defunct? my curiosity is an obsession.
I will definitely play around with some simple examps when i get in tomorrow and send the plans. i'll start with a two table join using nested loops, then switch one table to a 'select' that widdles the result set different. then change the leading or ordered hint and see if i can stir things up.
you prob have more insight than i into the optimizer - can u tell me what possible settings per se on plan space or anything else can direct oracle to "seek further or less" in its estimations in certain circumstances? for ex, recently i am reading a lot about cbo that has certain disregard in "left field" queries vs other types - ie, im seeing lingo around queries that join 'a join to a driver table', 'a driver table to a join', or join 'two joins' and that there are preferences that can cause oracle in the interest of expediting its crunching for access methods to disregard various algorithm paths all together?
talk soon - i'm on the east coast e.s.t, and get into work around 7 or 8am so i'll experiment around and send you some plans shortly thereafter. thx for raising the question; perhaps it will end a grudge i have been starting to harbor against oracle in my new environment!
-harry
ps - another frustration of mine is that sometimes i can fully control optimization approaches within sub-selects and sometimes i cant budge or "force" a hint into the inner sets.
for ex -
select a.id from
(select id,type from table_a minus select id,type from id_exceptions) A,
(select type from table_b where somedate < sysdate order by type) b
where a.type = b.type
sometimes i do this crap because i know i can get the sort order such that a merge will work perfectly. at times i can alter the plan anyway i want in the outer select with /*+ use_hash(a b) */ or /*+ use_merge...etc - and it obeys my will. at times oracle refuses to budge....
|
|
|
Re: Performance query on where clause whether to provide the argument in begining or at ending state [message #317200 is a reply to message #317179] |
Tue, 29 April 2008 22:57 |
rleishman
Messages: 3728 Registered: October 2005 Location: Melbourne, Australia
|
Senior Member |
|
|
Harry,
To pre-empt JR's response, he is saying that the sequence of tables in the FROM clause will not affect an un-hinted query.
Everything you say about the dangers of NL for high-volume queries is true, but there are some misleading things in your responses.
If your database always chooses an available index, then something is wrong:- OPTIMIZER_MODE is set to RULE, FIRST_ROWS, or FIRST_ROWS_nnnn
- Initialisation params that affect index costings are wrong
- PGA or related initialisation params are badly tuned
- Statistics are poorly gathered
For the sort of query that the OP is using, the 10g CBO will get it right 99 times out of 100 if your environment is correctly set up.
Common cases where it gets it wrong are:
- 5 or more tables in the FROM clause. Not every path is evaluated.
- Conjunctions or disjunctions of filter clauses. CBO will make a great estimate as to how many rows have STATUS='A', or how many have AMT>500, but will make a guess how many have BOTH or EITHER.
Placing the filter clause in an inline view will not affect the plan:
select empfld.fields_i_want,d.fields_i_want
from
(select e.fields_i_want
from employee
where employee_id = 1321
) empfld,
department d
Where d.department_id = empfld.department_id; is the same as
select empfld.fields_i_want,d.fields_i_want
from employee empfld
, department d
Where d.department_id = empfld.department_id
and empfld.employee_id = 1321;
View merging and predicate pushing looks after this.
Also, the LEADING hint is redundant if you use the ORDERED hint.
V7.3 and 8.0 required the degree of hinting that you are advocating. Later versions are much more tolerant if your database is correctly set up.
Best practice is to get your statistics right and only hint complex queries where it gets it wrong. Even then, "soft" hints like CARDINALITY that correct the wild guesses made by the optimiser are preferred over "hard" hints that restrict potential future access paths.
Ross Leishman
|
|
|
|
Re: Performance query on where clause whether to provide the argument in begining or at ending state [message #317217 is a reply to message #317203] |
Wed, 30 April 2008 01:08 |
rleishman
Messages: 3728 Registered: October 2005 Location: Melbourne, Australia
|
Senior Member |
|
|
We're hijacking the thread a bit here. You have a lot of potentially unrelated questions - perhaps you could ask them in the Performance Tuning forum.
I'll wait to see your replies there. But just to keep you interested- Tuning parameters can be found in the Performance Tuning Guide at http://tahiti.oracle.com.
- Your understand of optimisation is misguided. The cost of evaluating plans is not a decision-point; it's just a fact of life.
- CBO always chooses the evaluated plan with the lowest cost, but it does not evaluate every plan, and Cost is not always an accurate measure of Performance. Also Cost is relative to your assumed goal, which can be FIRST_ROWS, ALL_ROWS, FIRST_ROWS_1 .... The cost of retrieving the first few rows with FTS and hash joins is high, but not as high as the cost of retrieving all rows with indexed nested loops joins.
- Try SHOW PARAMETER OPTIMIZER_MODE
- Search on PGA_AGGREGATE_TARGET
See you in the PT forum.
Ross Leishman
|
|
|
|
|
Goto Forum:
Current Time: Sun Dec 08 18:33:35 CST 2024
|