Re: Navigation question

From: dawn <dawnwolthuis_at_gmail.com>
Date: 16 Feb 2007 11:45:47 -0800
Message-ID: <1171655147.355068.254690_at_p10g2000cwp.googlegroups.com>


On Feb 16, 11:16 am, "Marshall" <marshall.spi..._at_gmail.com> wrote:
> On Feb 16, 8:34 am, "dawn" <dawnwolth..._at_gmail.com> wrote:
>
> > On Feb 16, 9:48 am, "Marshall" <marshall.spi..._at_gmail.com> wrote:
>
> > They
> > have the same number of queries. How many times do I have to say that
> > I am not, not, not, absolutely not suggesting that we do more round
> > trips or split queries.
>
> You split a query. You had tin and you needed contacts, and you
> did it with two queries. I did it with one. Navigation splits queries.
> The performance of navigational approaches relative to declarative
> approaches ranges from as-good-as to unboundedly worse. Yes,
> you kept the total query count the same. However you introduced
> an explicit ordering between the two queries that wasn't there
> before, and that isn't free.
>
> Here's your approach:
>
> select name, companyid from companies where tin = XXX;
> mycompany = companyid;
> select emailaddress from contacts where companyid = mycompany;
>
> vs.
>
> select name from companies where tin = XXX;
> select emailaddress from contacts natural join companies where tin =
> XXX;
>
> Here's a possible execution scenario:
>
> no index on tin
> no index on contacts.companyid
> system has contacts fully loaded in cache
>
> your approach:
>
> select name, companyid from companies where tin = XXX;
>
> 1) requires a full scan of companies because no index on tin
> 2) loads companies from disk, pushing contacts out of cache
> 3) full scan of companies
>
> select emailaddress from contacts where companyid = mycompany;
>
> 3) loads contacts from disk
> 3) full scan of contacts
>
> my approach:
>
> I send these two queries to the system:
>
> q1: select name from companies where tin = XXX;
> q2: select emailaddress from contacts natural join companies where tin
> = XXX;
OK, now we are getting somewhere. My knowledge must be dated and/or with the wrong tools. I was using ODBC against a DBMS where SQL was not its only language. You are saying that you can pass in two queries together and the tools would not execute them sequentially, but figure them out together. That is definitely a piece I was missing. I only played around with jdbc and mysql a little bit a few years back, but perhaps if I read the current jdbc and mysql information I would understand this. Are you passing both SQL statements in the same jdbc statement?

> 1) The system observes that they are both selects so it can order them
> at its pleasure. So it picks q2 first because it has contacts loaded
> in cache.

Great. The DBMS I use would not do this, I'm pretty sure.

> 2) scans contacts from cache
> 3) requires a full scan of companies because no index on tin
> 4) loads companies from disk, pushing contacts out of cache
> 5) full scan of companies
>
> Your approach needlessly did twice as much disk IO, because
> it hardcoded a navigational approach into the queries.

I keep some of my application development skills current, but obviously I'm not up to speed on how SQL is used with an SQL-DBMS.

> Your methodology of comparing the best case scenario
> of the two approaches is invalid. The important things to be
> compared are typical case and worst case. What you're
> advocating has unboundedly worse worst-case performance.

I now have the additional information that your two SQL statements were submitted together somehow and the DBMS could then decide how to fill the combined requests. Does this work the same from an application development standpoint with all major SQL-DBMS's -- DB2, SQL Server, Oracle, and MySQL? They all take in multiple SQL statements and then execute the combination in an optimized way?

> I'm still hoping for an estimate of the number of queries and
> number of bytes in my previous example, and an admission
> that in that scenario the one-query approach is the clear
> winner.
>
> Marshall

I trust you have seen my agreement on that. I definitely agree. The piece I was missing was that multiple SQL statements could be submitted (e.g. via ODBC) and the DBMS would not execute them sequentially unless that is how it decided to do it. Is that correct? Thanks. --dawn Received on Fri Feb 16 2007 - 20:45:47 CET

Original text of this message