Re: Extending my question. Was: The relational model and relational algebra - why did SQL become the industry standard?
Date: Mon, 3 Mar 2003 18:18:28 -0000
"Lauri Pietarinen" <lauri.pietarinen_at_atbusiness.com> wrote in message
> Steve Kass wrote:
> > 2. [You said] Why can't it transform this query
> > SELECT DISTINCT P#, PNAME
> > FROM V_S_SP_P;
> > [assuming you admit DISTINCT is ok]
> yes yes...
> > into
> > SELECT P#, PNAME
> > FROM P
> > WHERE EXISTS
> > (SELECT * FROM SP
> > WHERE P.P# = SP.P#)?
> > 2. [I'm hearing] You can perhaps live with the requirement
> > that DISTINCT be added, but you want SQL Server to
> > figure out from the metadata that because of FK constraints
> > and NOT NULL constraints and correspondence of types,
> > the optimization should happen. You think that if the query
> > without DISTINCT "eliminated duplicates", then it would
> > be easier to generate an efficient plan.
> > I'm fairly sure but not absolutely sure, that the faster EXISTS
> > query is identical to the DISTINCT query,
> It is, assuming the obvious constraints are in place.
FYI. DB2 does do some optimisation (I.e. does not join to S), but does not go the whole hog in it's rewritten SQL (see below) However it does set it's 'Early Out Flag' in the access plan on the join between P and SP, which would AKAIK have the same effect as the EXISTS check in the 'more fully' optimised SQL suggest above.
Now, if it were possible to have a SQL constraint such that every Part has to be supplied by at least one supplier, then the query could be optimized to
SELECT P#, PNAME
But, without such a constraints being definable in [DB2] SQL, I think DB2's optimisiation is OK.
SET CURRENT SCHEMA DB2
CREATE TABLE S
( S# INTEGER NOT NULL UNIQUE
, SNAME CHAR(20) NOT NULL UNIQUE
CREATE TABLE P
( P# INTEGER NOT NULL UNIQUE
, PNAME CHAR(20) NOT NULL UNIQUE
CREATE TABLE SP
( S# INTEGER NOT NULL REFERENCES S(S#) , P# INTEGER NOT NULL REFERENCES P(P#) , QTY INTEGER NOT NULL CHECK (QTY > 0)
, UNIQUE(S#, P#)
CREATE VIEW V_S_SP_P
( S#, SNAME, QTY, P#, PNAME)
SELECT S.S#, S.SNAME, SP.QTY, P.P#, P.PNAME FROM S, SP, P
WHERE S.S# = SP.S#
AND SP.P# = P.P#
explain plan for SELECT DISTINCT P#, PNAME FROM V_S_SP_P ;
db2exfmt -d sample -1 -o explain.out
SELECT DISTINCT P#, PNAME
FROM V_S_SP_P Optimized Statement:
SELECT DISTINCT Q2.P# AS "P#", Q2.PNAME AS "PNAME" FROM DB2.SP AS Q1, DB2.P AS Q2
WHERE (Q1.P# = Q2.P#) Access Plan:
Total Cost: 25.338 Query Degree: 1 Rows RETURN ( 1) Cost I/O | 88 HSJOIN ( 2) 25.338 1 /------+-----\ 88 137 TBSCAN IXSCAN ( 3) ( 4) 25.1328 0.182339 1 0 | | 88 137 TABLE: DB2 INDEX: SYSIBM P SQL0303031750371
2) HSJOIN: (Hash Join) Cumulative Total Cost: 25.338 Cumulative CPU Cost: 511088 Cumulative I/O Cost: 1 Cumulative Re-Total Cost: 25.338 Cumulative Re-CPU Cost: 511088 Cumulative Re-I/O Cost: 1 Cumulative First Row Cost: 25.338 Estimated Bufferpool Buffers: 2 Arguments: --------- BITFLTR : (Hash Join Bit Filter used) FALSE EARLYOUT: (Early Out flag) TRUE TEMPSIZE: (Temporary Table Page Size) 4096 Predicates: ---------- 2) Predicate used in Join Relational Operator: Equal (=) Subquery Input Required: Yes Filter Factor: 0.0113636 Predicate Text: -------------- (Q1.P# = Q2.P#)
Business Intelligence, IBM Global Services Received on Mon Mar 03 2003 - 19:18:28 CET