Re: Navigation question

From: dawn <dawnwolthuis_at_gmail.com>
Date: 15 Feb 2007 11:56:45 -0800
Message-ID: <1171569404.983866.11200_at_h3g2000cwc.googlegroups.com>


On Feb 15, 12:23 pm, "Tony D" <tonyisyour..._at_netscape.net> wrote:
> I'm not going into this blow by blow, but a couple of comments ...
>
> On Feb 15, 2:17 pm, "dawn" <dawnwolth..._at_gmail.com> wrote:
>
> > On Feb 14, 7:24 pm, "Marshall" <marshall.spi..._at_gmail.com> wrote:
>
> > > I don't think I've ever written a cartesian product into client
> > > code. I've done a few at the command line, though.
>
> > I am understanding this to mean that you do not create or use views
> > used by your code that introduce cross-products, correct?
>
> Why mention of views in this context ? It's just joins between tables.

Yes, I'm not sure why I brought that in, except that I often refer to result sets that are might not be "normalized" relations as views. I realize that an SQL View is a stored query, so my use of it here was not SQL-compliant. Apologies.

> > OK, I'll buy that you would not use the term "navigation" to find this
> > node. I still think of this select statement as "going to" the
> > company where tin=XXX and extracting the name, where I suspect you do
> > not. I do not see it as navigating within the database, however, so
> > we are good so far.
>
> I'm not getting the repeated use of 'node', or of 'going to'.

In order to retrieve data, something somewhere moves to where that data is ("are" but I have switched to common use) in order to read it (from a disk sector, for example). At a higher level of abstraction, conceptually "I" (when coding) am "going to" the node, with values for the attributes about an entity (not caring about hardware, nor about whether the information about my conceptual entity is split up on the disk). I realize that this can be conceptualized differently as well, but this is not an uncommon way of thinking about it, even if I am not using the most common terms.

> > So instead of retrieving the companyid in the first statement and then
> > narrowing this query down to that precise company, we again "find" the
> > right company by using the same query as before. We include two
> > tables in our query although if we had retrieved the companyid in the
> > previous query we could have used just one. I do not understand how
> > that could yield the same or better performance than
>
> > select name, companyid from companies where tin = XXX;
> > mycompany = companyid;
> > select emailaddress from contacts where companyid = mycompany;
>
> Simple; when you code your second example, you are in effect hard
> coding a query plan. You are telling the DBMS, "I know best how, and
> in what order, to attempt to find answers to my overall query. My way
> is best, and will be until the end of time."

We had already established that it was not a single query. So, I am making two queries and Marshall is making two queries. If we compare our two second queries, how does my query differ in nature from Marshalls? In both cases we are telling the DBMS what to retrieve. In each case we are providing the query with information we have within the application. The only difference I can see is that one will perform better than the other.

> Which may not be true at
> all. It may be true the first day or week the system is in use, but it
> may be hopelessly wrong after a month or a year. With the first form
> of the query, you ask the DBMS a question and leave it to the DBMS to
> work out how best to provide the answer.

In both cases, there are two SQL statements, each of which leaves it to the DBMS to determine how best to get the response.

> The "best method" may change
> from one month to another, or indeed from one set of parameters to
> another. You may tune it *non-invasively* by adding indexes in the
> database, or by fine-tuning the statistics about the tables.

Should we not also tune it by providing the information we have already retrieved? Is there something wrong with that?

> And if the multitude of selects are from a client on the other end of
> a network, and the time waiting for bits to travel across the network
> to this as well.

I do understand this issue, trust me. Again, both Marshall and I were providing two SQL statements, so we are not talking about additional queries.

> Also, consider the less trivial case (I'll use an Ingres-ism here -
> the select ... begin ... end loop - it's notionally the same as using
> cursors).
>
> select some_stuff
> from some_table
> where some_condition
> begin
> select more_stuff
> from another_table
> where some_other_condition;
> /* and so on */
> end;
>
> There is no way to make a performance guarantee about this code, other
> than the number of ways things can get worse is almost infinite. For a
> start; how many times will you do the things in the begin ... end
> loop ?

I am not talking about adding in any iteration. I have never before thought that the question of "navigation" had anything to do with "iteration" but now both you and Marshall have brought this up. I find that fascinating, I really do. How did "navigation" ever get confused with or intermingled with "iteration"? I am not at all speaking up in favor of cursors when asking about navigation. I know the party line on iteration, even if I will admit to writing such.

> What happens if the queries within the begin ... end loop start
> to take a second or two (or more) longer ? What happens to the overall
> run time ? How much data will be dumped across the network to a
> client ?
>
> On the other hand, if you rewrote it as a single query operating on a
> set of data at a time, you would be able to get a query plan for the
> whole query from the DBMS, tune what needs tuned appropriately without
> touching the code and be able to start giving guarantees about what
> the code will get up to.

But again, please note that we are not discussing a single query, at least based on Marshall's response.

> My favourite example of this was my first "big" batch job using a
> DBMS. I wrote it much the way an idiot COBOL programmer

Hey, we are people too ;-)

> would try it.
> "Get a record. Do stuff. Write a record." The job ran for nearly two
> days. Then the local DBMS guru introduced me to the joys of set-based
> processing where you basically describe what you want to have happen
> and leave the how to the DBMS, and we rewrote the whole thing. Next
> time I ran it it took an hour and a half. And this job wasn't doing
> anything terribly fancy, just lots of it.
>
> Roy Hann will have many more war stories along these lines. If you
> look at his web site (http://www.rationalcommerce.com) there are
> presentations about how set based processing can provide orders of
> magnitude improvements on row-at-a-time stuff.

I probably should not admit it, but I've presented such material, so, yes, I understand that. It is more specifically the term and concept of navigation, of moving from my house to the movie theater, of going from data about a person to data about this person's children, etc. that is of interest to me with this question. It sounds like I might need a term that does not imply iteration to others. Surely when navigating in a car it does not bring iteration to mind, so this is confusing me.

> Hopefully that answers some of the later questions too.
>
> - Tony

Apprecated. Thanks. --dawn Received on Thu Feb 15 2007 - 20:56:45 CET

Original text of this message