Re: Navigation question

From: Tony D <tonyisyourpal_at_netscape.net>
Date: 15 Feb 2007 10:23:35 -0800
Message-ID: <1171563815.762999.148990_at_a75g2000cwd.googlegroups.com>


I'm not going into this blow by blow, but a couple of comments ...

On Feb 15, 2:17 pm, "dawn" <dawnwolth..._at_gmail.com> wrote:
> On Feb 14, 7:24 pm, "Marshall" <marshall.spi..._at_gmail.com> wrote:
>
> > I don't think I've ever written a cartesian product into client
> > code. I've done a few at the command line, though.
>
> I am understanding this to mean that you do not create or use views
> used by your code that introduce cross-products, correct?
>

Why mention of views in this context ? It's just joins between tables.

>
> OK, I'll buy that you would not use the term "navigation" to find this
> node. I still think of this select statement as "going to" the
> company where tin=XXX and extracting the name, where I suspect you do
> not. I do not see it as navigating within the database, however, so
> we are good so far.
>

I'm not getting the repeated use of 'node', or of 'going to'.

> So instead of retrieving the companyid in the first statement and then
> narrowing this query down to that precise company, we again "find" the
> right company by using the same query as before. We include two
> tables in our query although if we had retrieved the companyid in the
> previous query we could have used just one. I do not understand how
> that could yield the same or better performance than
>
> select name, companyid from companies where tin = XXX;
> mycompany = companyid;
> select emailaddress from contacts where companyid = mycompany;
>

Simple; when you code your second example, you are in effect hard coding a query plan. You are telling the DBMS, "I know best how, and in what order, to attempt to find answers to my overall query. My way is best, and will be until the end of time." Which may not be true at all. It may be true the first day or week the system is in use, but it may be hopelessly wrong after a month or a year. With the first form of the query, you ask the DBMS a question and leave it to the DBMS to work out how best to provide the answer. The "best method" may change from one month to another, or indeed from one set of parameters to another. You may tune it *non-invasively* by adding indexes in the database, or by fine-tuning the statistics about the tables.

And if the multitude of selects are from a client on the other end of a network, and the time waiting for bits to travel across the network to this as well.

Also, consider the less trivial case (I'll use an Ingres-ism here - the select ... begin ... end loop - it's notionally the same as using cursors).

   select some_stuff
   from some_table
   where some_condition
   begin

      select more_stuff
      from another_table
      where some_other_condition;
      /* and so on */

   end;

There is no way to make a performance guarantee about this code, other than the number of ways things can get worse is almost infinite. For a start; how many times will you do the things in the begin ... end loop ? What happens if the queries within the begin ... end loop start to take a second or two (or more) longer ? What happens to the overall run time ? How much data will be dumped across the network to a client ?

On the other hand, if you rewrote it as a single query operating on a set of data at a time, you would be able to get a query plan for the whole query from the DBMS, tune what needs tuned appropriately without touching the code and be able to start giving guarantees about what the code will get up to.

My favourite example of this was my first "big" batch job using a DBMS. I wrote it much the way an idiot COBOL programmer would try it. "Get a record. Do stuff. Write a record." The job ran for nearly two days. Then the local DBMS guru introduced me to the joys of set-based processing where you basically describe what you want to have happen and leave the how to the DBMS, and we rewrote the whole thing. Next time I ran it it took an hour and a half. And this job wasn't doing anything terribly fancy, just lots of it.

Roy Hann will have many more war stories along these lines. If you look at his web site (http://www.rationalcommerce.com) there are presentations about how set based processing can provide orders of magnitude improvements on row-at-a-time stuff.

Hopefully that answers some of the later questions too.

  • Tony
Received on Thu Feb 15 2007 - 19:23:35 CET

Original text of this message