Re: Navigation question

From: dawn <dawnwolthuis_at_gmail.com>
Date: 16 Feb 2007 10:12:53 -0800
Message-ID: <1171649573.294790.274980_at_a75g2000cwd.googlegroups.com>


On Feb 16, 11:16 am, "Marshall" <marshall.spi..._at_gmail.com> wrote:
> On Feb 16, 8:34 am, "dawn" <dawnwolth..._at_gmail.com> wrote:
>
> > On Feb 16, 9:48 am, "Marshall" <marshall.spi..._at_gmail.com> wrote:
>
> > They
> > have the same number of queries. How many times do I have to say that
> > I am not, not, not, absolutely not suggesting that we do more round
> > trips or split queries.
>
> You split a query.

No, I did not. I took your two queries and replaced them with my two queries. Please re-read or point me to where I have split a query. Thanks. --dawn

You had tin and you needed contacts, and you
> did it with two queries. I did it with one. Navigation splits queries.
> The performance of navigational approaches relative to declarative
> approaches ranges from as-good-as to unboundedly worse. Yes,
> you kept the total query count the same. However you introduced
> an explicit ordering between the two queries that wasn't there
> before, and that isn't free.
>
> Here's your approach:
>
> select name, companyid from companies where tin = XXX;
> mycompany = companyid;
> select emailaddress from contacts where companyid = mycompany;
>
> vs.
>
> select name from companies where tin = XXX;
> select emailaddress from contacts natural join companies where tin =
> XXX;
>
> Here's a possible execution scenario:
>
> no index on tin
> no index on contacts.companyid
> system has contacts fully loaded in cache
>
> your approach:
>
> select name, companyid from companies where tin = XXX;
>
> 1) requires a full scan of companies because no index on tin
> 2) loads companies from disk, pushing contacts out of cache
> 3) full scan of companies
>
> select emailaddress from contacts where companyid = mycompany;
>
> 3) loads contacts from disk
> 3) full scan of contacts
>
> my approach:
>
> I send these two queries to the system:
>
> q1: select name from companies where tin = XXX;
> q2: select emailaddress from contacts natural join companies where tin
> = XXX;
>
> 1) The system observes that they are both selects so it can order them
> at its pleasure. So it picks q2 first because it has contacts loaded
> in cache.
>
> 2) scans contacts from cache
> 3) requires a full scan of companies because no index on tin
> 4) loads companies from disk, pushing contacts out of cache
> 5) full scan of companies
>
> Your approach needlessly did twice as much disk IO, because
> it hardcoded a navigational approach into the queries.
>
> Your methodology of comparing the best case scenario
> of the two approaches is invalid. The important things to be
> compared are typical case and worst case. What you're
> advocating has unboundedly worse worst-case performance.
>
> I'm still hoping for an estimate of the number of queries and
> number of bytes in my previous example, and an admission
> that in that scenario the one-query approach is the clear
> winner.
>
> Marshall
Received on Fri Feb 16 2007 - 19:12:53 CET

Original text of this message