Re: DBA FIGHT

From: Daniel Morgan <damorgan_at_exxesolutions.com>
Date: Wed, 25 Jun 2003 09:05:30 -0700
Message-ID: <3EF9C849.E58913C@exxesolutions.com>

Anurag Varma wrote:

> "Daniel Morgan" <damorgan_at_exxesolutions.com> wrote in message news:3EF91162.BAEDB974_at_exxesolutions.com...
> --snip--
> > I have done this demo in front of my students at the U for two years now. In a week or two I will do it again
> > for this summer's class.
> >
> > On the same machine I run DBMS_STATS and then EXPLAIN PLAN six statements that produce the same result set
> > from the data.
> > These are the two that have the lowest cost. The EXISTS is the lowest cost in 8.1.7 and the ridiculous NOT IN
> > with a MINUS that I wrote trying
> > to kick the cost up is the lowest in 9.2.0.1.
> >
> > So ... yes I am. Here is the full set of statements:
> >
> > SELECT srvr_id
> > FROM servers
> > INTERSECT
> > SELECT srvr_id
> > FROM serv_inst;
> >
> > SELECT srvr_id
> > FROM servers
> > WHERE srvr_id IN (
> > SELECT srvr_id
> > FROM serv_inst);
> >
> > SELECT srvr_id
> > FROM servers
> > WHERE srvr_id IN (
> > SELECT i.srvr_id
> > FROM serv_inst i, servers s
> > WHERE i.srvr_id = s.srvr_id);
> >
> > SELECT DISTINCT s.srvr_id
> > FROM servers s, serv_inst i
> > WHERE s.srvr_id = i.srvr_id;
> >
> > SELECT DISTINCT srvr_id
> > FROM servers
> > WHERE srvr_id NOT IN (
> > SELECT srvr_id
> > FROM servers
> > MINUS
> > SELECT srvr_id
> > FROM serv_inst);
> >
> > SELECT srvr_id
> > FROM servers s
> > WHERE EXISTS (
> > SELECT srvr_id
> > FROM serv_inst i
> > WHERE s.srvr_id = i.srvr_id);
> >
> > Want to guess on the one with the highest cost?
> >
> > --
> > Daniel Morgan
> > http://www.outreach.washington.edu/extinfo/certprog/oad/oad_crs.asp
> > damorgan_at_x.washington.edu
> > (replace 'x' with a 'u' to reply)
> >
> >
>
> Daniel,
>
> Without doing any test:
> As far as cost is concerned ... my guess would be "exists". Assuming plain dbms_stats statististics.
>
> However, I would not consider a query to be faster just on the basis of "cost".
> True, it will get my interest ... however, I would be more interested in two more things:
> *) The block gets that each query requires. Lower figure will get my interest.
> *) A rough comparison of timings by either using plain set timing on .. or using get_time.
>
> Now you kinda know your dataset, which is unknown to me, and which matters quite a lot
> in this discussion (cardinality .. skew_factor).
>
> In the end, I think I would lean toward exists for table sizes you mentioned. For bigger table sizes
> I would also be inclined to tryout the hash_sj hint for exists.
>
> In the end: as I said - I need to have an idea of the dataset I'm dealing with to make a more informed decision.
> servers.serv_id's cardinality and serv_instance.serv_id's cardinality.
>
> :)
> Thanks,
> Anurag

The point of my demo ... and the point of my comments here ... is not to influence people to write code one particular way or another: But rather the exact opposite.

To convince them that you can not predict, by just looking at a SQL statement, which will be the best and that you must use EXPLAIN PLAN, you must look at I/O, you must try multiple ways of accomplishing a task and identify the one that is the most efficient.

My point ... is your point: Nothing more. Nothing less. Blanket statements like EXISTS is better than NOT IN are just not true. The correct answer, as we have both pointed out, is that "it depends."

--
Daniel Morgan
http://www.outreach.washington.edu/extinfo/certprog/oad/oad_crs.asp
damorgan_at_x.washington.edu
(replace 'x' with a 'u' to reply)

Received on Wed Jun 25 2003 - 11:05:30 CDT