Home » SQL & PL/SQL » SQL & PL/SQL » Performance query on where clause whether to provide the argument in begining or at ending statement
Performance query on where clause whether to provide the argument in begining or at ending statement [message #316822] Mon, 28 April 2008 08:24 Go to next message
prashas_d
Messages: 66
Registered: February 2007
Member
Hi All,

I dont have much knowledge on SQL performance tuning. I need to write a select query by joining two or more huge tables where performance is very much considered.

Please let me know which is the best way to write a query that produce the best performance:

----------------------------------
select * from
employee e, department d
where e.employee_id = 132
and e.department_id = d.department_id;

(or)

select * from
employee e, department d
where e.department_id = d.department_id
and e.employee_id = 132;
---------------------------------------

If there are 3 tables like employee, employer and department. If the table employee is the hugest table followed by employer and department table then which is the best way to write a query?
-------------------------------------
select * from
employee ee, department d, employer er
where ee.department_id = d.department_id
and ee.department = er.department;

(or)

select * from
employee ee, department d, employer er
where d.department_id = ee.department_id
and d.department = er.department;
------------------------------------------

Thanks in advance,
prashas_d.
Re: Performance query on where clause whether to provide the argument in begining or at ending state [message #316823 is a reply to message #316822] Mon, 28 April 2008 08:26 Go to previous messageGo to next message
Frank
Messages: 7880
Registered: March 2000
Senior Member
Gather statistics and let the optimizer do its job. Order of tables or where-clauses has NO influence on the cost based optimizer.
Re: Performance query on where clause whether to provide the argument in begining or at ending state [message #316824 is a reply to message #316822] Mon, 28 April 2008 08:29 Go to previous messageGo to next message
BlackSwan
Messages: 25036
Registered: January 2009
Location: SoCal
Senior Member
>select * from
very, very poor implementation!
All desired/expected columns should be explicitly enumerated.
Re: Performance query on where clause whether to provide the argument in begining or at ending state [message #316959 is a reply to message #316822] Tue, 29 April 2008 00:08 Go to previous messageGo to next message
harrysmall3
Messages: 109
Registered: April 2008
Location: Massachusetts
Senior Member
Good question prashas_d - and first let me restate anacedent's wisdom - specify the fields you want, i have a feeling you dont want all of them from all tables.

Thinking in terms of order of events is good a practice, as /*+ ordered */ and /*+ leading */ have very good purpose in making sort join steps efficient and getting stuck in a nested loop on the wrong table is death.

However lets keep this simple because what you ask for is simple. I have a sneaky feeling there may be some performance probs here if you have an index on emp_id and this query is doing a nested loop which is a "quadratic" function (as the records in a dataset go up, the time to process goes up exponentially).

In the world of large datasets which i have been stuck in for quite some time I very very rarely index on a join field.

The thing that sticks out to me is employee id 1321 - a single employee.

I'd probably write something like this -

select /*+ use_hash(empfld d) */ empfld.fields_i_want,d.fields_i_want from
(select e.fields_i_want from employee where employee_id = 1321) empfld,
department d
Where d.department_id = empfld.department_id;

my rationale being - #1 get me a small list from the employee table and use the employee_id index if it exists.
#2 hash join the small set with the department table.

If an index exists on department_id (probably) on departments, we dont want to invoke a nested loop that scans that index for every employee row. we may have to "trash it" so to speak with a no_index hint or wrap some null function around it like
'Where nvl(d.department_id,d.department_id,null) = empfld.department_id.

My whole obsession here is to skip (do the explain plan to find out) any nested_loops which indexes invoke on joins.

probably i'd start with just :

select /*+ use_hash(e d) leading(e) ordered*/ the_query_fields from e, d where
e.emp_id = 132 and
e.dept_id d.dept_id

my focus is always - what do i want oracle to do. what i am attempting here without some stats is to make oracle:
#1 - get me the employee records for id 132 and join that result set to the department table.
#2 - use the employee id index for a little boost on employee table
#3 - use a hash join against the department table with a small subset of data
#4 - avoid recusive scans of the department table index.

the order i want is employee table first because i want all id 132's upfront. I do not want to join the two tables and then widdle down.

SO - after that tirade; i say order tables and where clauses in the order that will widdle things down the best. which means 'smallest quickest'.

and that my friend isnt about size of table - its about 'what criteria is specific to an individual table in a query vs criteria that joins the table' - a big table can produce a small result set for a join...compare result set to result set -
employee is prob the big guy - but employee 132 data is prob the small result set.

divide your criteria into those two categories - what criteria pertains to a single table vs the join of tables. make small result sets and join them. in this case my example i believe is called a "right field" query - a result set joined to a driver table.

if you had criteria pertaining specific to departments, say d.dept_something_type = 'A' or 'B' - i'd probably make a result set of that as departments
(select fields from department where ...) as dept
and join via a hash_join empflds and dept.

So - order by size of result sets and avoid nested loops like the plague unless you are dealing with an amount of records you can write on your hand!

g'luck and best wishes!
-harry

Re: Performance query on where clause whether to provide the argument in begining or at ending state [message #317039 is a reply to message #316959] Tue, 29 April 2008 05:11 Go to previous messageGo to next message
JRowbottom
Messages: 5933
Registered: June 2006
Location: Sunny North Yorkshire, ho...
Senior Member
Good advice for Oracle 7.3.4, but out of date.
The CBO can, and will, rewrite your query internally to produce the best results it can.

Can you show me an example (in 9i or later) of a query like the one you were examining where simply changing the order of the tables or the statements in WHERE clause changes the execution Plan?

icon8.gif  Re: Performance query on where clause whether to provide the argument in begining or at ending state [message #317179 is a reply to message #317039] Tue, 29 April 2008 19:10 Go to previous messageGo to next message
harrysmall3
Messages: 109
Registered: April 2008
Location: Massachusetts
Senior Member
Hey JRowBottom -

i seem to always be in two worlds - 1) where Oracle does find the correct access paths all the time after its optimization and 2) it wants to kill me and my queries with a vengence! I'm on 10g and right now trying to learn more about the distinction between the generation of cost and the actual decision for oracle access methods. it appears that "some unforseen force of the universe" causes oracle to alter its decision making at timed.

am i correct in saying that the optimizer is making an estimate at the duration of time it will take to find the correct access paths, and that optimizer hints per se direct oracle to do this estimation using different categories of algorithms - like with use_hash, the optimization is taking the query itself as data, making base hash tree structures and calculating how long its available algorithms in that group will probably take to determine the best access paths (ie the plan isnt the end all to the access methods just an estimation of best route to take based on calculation time it perceives)

then it determines the cost based on the longest estimated calculation time to find (which it doesnt do until it run) the access paths? now whether or not they are best will be determined at that time - and that is often based i think on whether i am at the terminal or someone else!

back to your question because perhaps i am more cursed than originally thought (company got bought out, i'm on a new server with a new dba crew - a powerful box, i have like a couple gig in the pga which makes hash join addicts like myself happy, and unfortunately a system that will always choose a nested-loop approach to any query where a table has an index because the cost 'estimation' always results lowest. throw an index anywhere and it wants to devour it like a rabid starving dog!

are you saying on your 9i environment you cannot effect a plan difference in a multi table join query by forcing different LEADING(table) tables via /*+leading(tbl) */? If so are they hiring where you are! lol...

i'm at home right now and am going to try to dig up the last 'mosnter' query i had to tame for a biz user where the final tweak was the leading hint - in my cases the table is always a sub-select result set, so i guess i'm always trying to dictate "what sets if joined first would widdle down the next outer join's sort time or join size".

I'll def dig up some examps for you. The more i think about it -i havent really changed the order of how i write the 'where' when i want to change the order of listed tables (or sub-selects), i rely on the 'leading' hint to change it.
perhaps the /*+ ordered */ hint for order of where clause criteria is constant now (such that i cant alter the path by solely changing the order of the clause criteria) - and until you posted this, i really hadnt given much thought on the distinguishment (order of where clause vs order of table processing). Though that makes me think - if Oracle re-arranges the where criteria will the ordered hint even work (even if its wrong based on original order) or is it defunct? my curiosity is an obsession.

I will definitely play around with some simple examps when i get in tomorrow and send the plans. i'll start with a two table join using nested loops, then switch one table to a 'select' that widdles the result set different. then change the leading or ordered hint and see if i can stir things up.

you prob have more insight than i into the optimizer - can u tell me what possible settings per se on plan space or anything else can direct oracle to "seek further or less" in its estimations in certain circumstances? for ex, recently i am reading a lot about cbo that has certain disregard in "left field" queries vs other types - ie, im seeing lingo around queries that join 'a join to a driver table', 'a driver table to a join', or join 'two joins' and that there are preferences that can cause oracle in the interest of expediting its crunching for access methods to disregard various algorithm paths all together?

talk soon - i'm on the east coast e.s.t, and get into work around 7 or 8am so i'll experiment around and send you some plans shortly thereafter. thx for raising the question; perhaps it will end a grudge i have been starting to harbor against oracle in my new environment!

-harry
ps - another frustration of mine is that sometimes i can fully control optimization approaches within sub-selects and sometimes i cant budge or "force" a hint into the inner sets.

for ex -

select a.id from
(select id,type from table_a minus select id,type from id_exceptions) A,
(select type from table_b where somedate < sysdate order by type) b
where a.type = b.type

sometimes i do this crap because i know i can get the sort order such that a merge will work perfectly. at times i can alter the plan anyway i want in the outer select with /*+ use_hash(a b) */ or /*+ use_merge...etc - and it obeys my will. at times oracle refuses to budge....
Re: Performance query on where clause whether to provide the argument in begining or at ending state [message #317200 is a reply to message #317179] Tue, 29 April 2008 22:57 Go to previous messageGo to next message
rleishman
Messages: 3724
Registered: October 2005
Location: Melbourne, Australia
Senior Member
Harry,

To pre-empt JR's response, he is saying that the sequence of tables in the FROM clause will not affect an un-hinted query.

Everything you say about the dangers of NL for high-volume queries is true, but there are some misleading things in your responses.

If your database always chooses an available index, then something is wrong:
  • OPTIMIZER_MODE is set to RULE, FIRST_ROWS, or FIRST_ROWS_nnnn
  • Initialisation params that affect index costings are wrong
  • PGA or related initialisation params are badly tuned
  • Statistics are poorly gathered

For the sort of query that the OP is using, the 10g CBO will get it right 99 times out of 100 if your environment is correctly set up.

Common cases where it gets it wrong are:
- 5 or more tables in the FROM clause. Not every path is evaluated.
- Conjunctions or disjunctions of filter clauses. CBO will make a great estimate as to how many rows have STATUS='A', or how many have AMT>500, but will make a guess how many have BOTH or EITHER.

Placing the filter clause in an inline view will not affect the plan:

select empfld.fields_i_want,d.fields_i_want 
from 
(select e.fields_i_want 
 from employee 
 where employee_id = 1321
) empfld,
department d
Where d.department_id = empfld.department_id;
is the same as
select empfld.fields_i_want,d.fields_i_want 
from employee empfld
,    department d
Where d.department_id = empfld.department_id
and   empfld.employee_id = 1321;

View merging and predicate pushing looks after this.


Also, the LEADING hint is redundant if you use the ORDERED hint.


V7.3 and 8.0 required the degree of hinting that you are advocating. Later versions are much more tolerant if your database is correctly set up.

Best practice is to get your statistics right and only hint complex queries where it gets it wrong. Even then, "soft" hints like CARDINALITY that correct the wild guesses made by the optimiser are preferred over "hard" hints that restrict potential future access paths.

Ross Leishman
Re: Performance query on where clause whether to provide the argument in begining or at ending state [message #317203 is a reply to message #317200] Tue, 29 April 2008 23:48 Go to previous messageGo to next message
harrysmall3
Messages: 109
Registered: April 2008
Location: Massachusetts
Senior Member
Hi Ross - thank you very much for your detailed response.
your bullet list is of great value to me because i have to talk to some of the new "big wigs" tomorrow about our performance issues.

My company got bought out recently and the their systems control is tyrannical - no developer even at my sr level where i used to have full dba privileges when i ran a project - even gets access to any server or session stats. i had to get an inside friend to get me the ora.ini on the side.

i want to laugh and cry at the same time - i have a sneaky feeling the first_rows_nnnn setting might be on (this is a development region with no frontend interface).

here's my thoughts - whenever i use our toad sql tuner it consistently suggests the first rows hint!

I'd like to ask you a couple more questions on your list if you dont mind. first - is there some "overall" space setting that dictates how much memory oracle can use to crunch its decisions on choosing access methods? I get the feeling that it just isnt 'going all the way' so to speak.

Is it correct to say that the optmization plan is a calculation on the query itself to make estimates of how long it is going to take to determine the best access methods? ie - the optimization plan determines the approach it will take to make the final determination on access paths, and that the optimization plan in itself is not end-result - it then tags a cost based on the amount of time it estimates it will take to get to that "best" end result.

Is it normal for Oracle to always choose the path of least cost? I'm wondering what i can deduce from the fact that the nested loops always calculate to least cost. Does it sound like the cost estimation is wrong, or that oracle is not considering other factors that would steer it to take on a higher-cost algorithmic approach?

I know i'm probably not using the most precise wording here. If Oracle was correctly configured such that it correctly chose to use hash join algorithms over a recusrive index search would that show in the optimizer cost (ie, the hash method should show lower cost?)

Sorry for my rambling, i am excited by your information. Final questions - could you tell me where i can find those initialization params for index costing (ora.ini settings?) and what pga params you a referencing? I know our stats are good because dbms_stats execution is a power the kings forgot to take away from me!

i cannot thank you enough!

i would like to show some query examples tomorrow to you and JR for some feedback. I still think I can affect the plan optimization with inline queries - granted this may be due to the fact that I have to without the proper optimization settings. I like to think that the one advantage i have over oracle is that i know upfront what i expect for data hits.

But the only constant I know in my decade of oracle companionship is that everytime i think i "know the all" of something i am humbled into realizing i know nothing! Smile

regards
harry





Re: Performance query on where clause whether to provide the argument in begining or at ending state [message #317217 is a reply to message #317203] Wed, 30 April 2008 01:08 Go to previous messageGo to next message
rleishman
Messages: 3724
Registered: October 2005
Location: Melbourne, Australia
Senior Member
We're hijacking the thread a bit here. You have a lot of potentially unrelated questions - perhaps you could ask them in the Performance Tuning forum.

I'll wait to see your replies there. But just to keep you interested
  • Tuning parameters can be found in the Performance Tuning Guide at http://tahiti.oracle.com.
  • Your understand of optimisation is misguided. The cost of evaluating plans is not a decision-point; it's just a fact of life.
  • CBO always chooses the evaluated plan with the lowest cost, but it does not evaluate every plan, and Cost is not always an accurate measure of Performance. Also Cost is relative to your assumed goal, which can be FIRST_ROWS, ALL_ROWS, FIRST_ROWS_1 .... The cost of retrieving the first few rows with FTS and hash joins is high, but not as high as the cost of retrieving all rows with indexed nested loops joins.
  • Try SHOW PARAMETER OPTIMIZER_MODE
  • Search on PGA_AGGREGATE_TARGET

See you in the PT forum.

Ross Leishman
Re: Performance query on where clause whether to provide the argument in begining or at ending state [message #317224 is a reply to message #317217] Wed, 30 April 2008 01:27 Go to previous messageGo to next message
harrysmall3
Messages: 109
Registered: April 2008
Location: Massachusetts
Senior Member
Thank you for this advice and the reference link,- very much appreciated, Ross. The concept of cost related to goal is an eye opener.

Yes, see you in pt!

Best Regards,
Harry
Re: Performance query on where clause whether to provide the argument in begining or at ending state [message #317237 is a reply to message #317224] Wed, 30 April 2008 03:08 Go to previous message
JRowbottom
Messages: 5933
Registered: June 2006
Location: Sunny North Yorkshire, ho...
Senior Member
Ross has, as usual, explained the situation more accurately and readably than I could.
Previous Topic: PLS-00801 internal error
Next Topic: how to supress zero from query
Goto Forum:
  


Current Time: Sat Dec 03 22:01:10 CST 2016

Total time taken to generate the page: 0.11319 seconds