Re: Navigation question

From: Marshall <marshall.spight_at_gmail.com>
Date: 16 Feb 2007 09:16:30 -0800
Message-ID: <1171646188.656035.14990_at_v45g2000cwv.googlegroups.com>


On Feb 16, 8:34 am, "dawn" <dawnwolth..._at_gmail.com> wrote:
> On Feb 16, 9:48 am, "Marshall" <marshall.spi..._at_gmail.com> wrote:
>
> They
> have the same number of queries. How many times do I have to say that
> I am not, not, not, absolutely not suggesting that we do more round
> trips or split queries.

You split a query. You had tin and you needed contacts, and you did it with two queries. I did it with one. Navigation splits queries. The performance of navigational approaches relative to declarative approaches ranges from as-good-as to unboundedly worse. Yes, you kept the total query count the same. However you introduced an explicit ordering between the two queries that wasn't there before, and that isn't free.

Here's your approach:

select name, companyid from companies where tin = XXX; mycompany = companyid;
select emailaddress from contacts where companyid = mycompany;

vs.

select name from companies where tin = XXX; select emailaddress from contacts natural join companies where tin = XXX; Here's a possible execution scenario:

no index on tin
no index on contacts.companyid
system has contacts fully loaded in cache

your approach:

select name, companyid from companies where tin = XXX;

  1. requires a full scan of companies because no index on tin
  2. loads companies from disk, pushing contacts out of cache
  3. full scan of companies

select emailaddress from contacts where companyid = mycompany;

3) loads contacts from disk
3) full scan of contacts

my approach:

I send these two queries to the system:

q1: select name from companies where tin = XXX; q2: select emailaddress from contacts natural join companies where tin = XXX;

  1. The system observes that they are both selects so it can order them at its pleasure. So it picks q2 first because it has contacts loaded in cache.
  2. scans contacts from cache
  3. requires a full scan of companies because no index on tin
  4. loads companies from disk, pushing contacts out of cache
  5. full scan of companies

Your approach needlessly did twice as much disk IO, because it hardcoded a navigational approach into the queries.

Your methodology of comparing the best case scenario of the two approaches is invalid. The important things to be compared are typical case and worst case. What you're advocating has unboundedly worse worst-case performance.

I'm still hoping for an estimate of the number of queries and number of bytes in my previous example, and an admission that in that scenario the one-query approach is the clear winner.

Marshall Received on Fri Feb 16 2007 - 18:16:30 CET

Original text of this message