Re: Navigation question

From: dawn <dawnwolthuis_at_gmail.com>
Date: 15 Feb 2007 16:11:44 -0800
Message-ID: <1171584704.484255.76380_at_a34g2000cwb.googlegroups.com>


On Feb 15, 4:51 pm, "Marshall" <marshall.spi..._at_gmail.com> wrote:
> On Feb 15, 12:15 pm, "dawn" <dawnwolth..._at_gmail.com> wrote:
>
>
>
> > On Feb 15, 10:58 am, "Marshall" <marshall.spi..._at_gmail.com> wrote:
>
> > > > > select name from companies where tin = XXX;
>
> > > > OK, I'll buy that you would not use the term "navigation" to find this
> > > > node. I still think of this select statement as "going to" the
> > > > company where tin=XXX and extracting the name, where I suspect you do
> > > > not. I do not see it as navigating within the database, however, so
> > > > we are good so far.
>
> > > A guy Fred at work thinks of this statement as x86 machine code,
> > > and Joe thinks of it all as electrons whirling around silicon
> > > pathways. Engelbart says they are both wrong and that it's
> > > all subatomic particles, while Desmond claims they are all
> > > thinking at the wrong level; the truth we can
> > > perceive is just shadows on Plato's cave.
>
> > Yup, so conceptually, I'm navigating and there is no problem with
> > that, right?
>
> That's the opposite of what I'm saying.

I know, but I figured you would agree that there is no problem with thinking conceptually about "going from" information about a person to information about that person's children, for example. Sorry to flash my personality after you let yours show ;-)

> Fred, Joe, Engelbart, Desmond, and you are all looking
> at the SQL and thinking about something else.

Actually, I'm thinking conceptually (navigation), then modeling logically (in this case with SQL).

> That *is*
> a problem. The *query* contains no navigation. There
> might be a model of the query in your head
> that includes navigation, yes.

Yup, see I knew you would agree :-)

> But that's a statement
> about your head, not a statement about the query.

Yes, agreed.

> Sometimes when I'm looking at Java code I'm actually
> thinking about Scarlett Johansson. There is however
> no Scarlett Johansson in the code. And the fact that I'm
> thinking about Scarlett Johansson actually interferes with
> my ability to focus on and correctly perceive the reality
> of Java code.

Thanks for the story, but I'm not so sure the same is true for thinking conceptually and then writing logically. She's with Justin Timberlake now IIRC.

> Thinking at the appropriate level of abstraction
> works best.

For the specific task at hand, yes, I agree.

> The reason garbage collection is a win is because it
> allows the programmer not to even have to think about
> allocation and deallocation.

Yes, an appreciated feature.

> Now if you're feeling perverse,
> or if some very narrow circumstances warrant it, then
> maybe you want to be thinking about allocation and
> deallocation in your Java code. But it's best if you can
> avoid it.
>
> It is worth mentioning that 1) programmers are
> almost always of the belief that they understand
> the performance characteristics of their code
> even without measuring it, and that 2) they are
> almost always wrong.

That is why mgmt or old ladies might say something like "build security in, be smart about performance, but let's address what we really find as the performance problems in future iterations, before we go live, of course, rather than doing any handstands based on speculation." We should not be careless, however, as there are still many projects that fail for reasons of performance. [I just sent an email  to VP of a large corp a few minutes ago in an effort to assist a client who is attempting to recover from performance issues, unrelated to SQL in this case.]

> The compiler/VM/query optimizer
> can do better than a human can, almost always, and
> it never gets distracted by thinking about navigation
> or Scarlett Johansson.

If it is as bad to pop up to a conceptual level when coding as to think about Scarlett Johansson, then I'll try to think about her instead too. I don't quite catch the fun of doing so however, so clue me in ;-)

> > > > So instead of retrieving the companyid in the first statement and then
> > > > narrowing this query down to that precise company, we again "find" the
> > > > right company by using the same query as before. We include two
> > > > tables in our query although if we had retrieved the companyid in the
> > > > previous query we could have used just one. I do not understand how
> > > > that could yield the same or better performance than
>
> > > > select name, companyid from companies where tin = XXX;
> > > > mycompany = companyid;
> > > > select emailaddress from contacts where companyid = mycompany;
>
> > > The canonical case to consider is client-server.
>
> > > Database performance is not my strong area, however it's easy
> > > enough to see that in my case, there will be an indexed lookup
> > > on tin that gets companyid, and then whatever mechanism
> > > (table scan?) will select all the email contacts based on that
> > > company id, in a single network round trip. Whereas your
> > > approach will do the indexed lookup in one network packet,
> > > then the client will issue a second request that is the same
> > > as the second part of my query.
>
> > No, no -- I was replacing your two SQL statements with my two SQL
> > statements. There are the same number of round trips. Please review
> > this and reply, if you would. Thanks. --dawn
>
> I see.
>
> So, you could make an argument about there being a difference; in
> practice I would be astonished if there was one.

Hmm. Ok. But is there something wrong with doing the "navigation" that I did with my statements?

> I have on a number
> of occasions gone to some lengths to refactor things to eliminate
> one or several indexed lookups from the query plan of a query in
> the belief that it would help performance; it never has so far.

Interesting.

> Index scans are ridiculously fast. Any difference is speed has
> been less than the difference from one query run to the next;
> it's in the noise. And the more complicated the query is, the
> harder your technique will be and the proprtionally smaller any
> already small effect the removal of the index scan will be.

But my technique was not difficult at all and I wasn't in a big, conscious way attempting to save on performance, it was a by-product of the approach instead. At each "node" I use set-based logic to get the desired result sets. If I have information to pass to an SQL query from a prior query, I use it to "navigate" to that next node or set of nodes. Whether working with VSAM files, Pick, or SQL-DBMS's, this approach has worked and has never been contested by anyone as being either difficult to maintain or to perform poorly. Additionally, I have seen the same techniques I use pretty much everywhere, but I now know what to look for to identify an author who thinks differently about it.

So, my question then is whether there is something problematic with code that does "navigating" as I did with the two SQL statements I gave in place of your two statements. This has definitely been helpful as I can now see the subtle difference between your statements and mine. I would not have considered doing it your way, but I still see mine as a teeny bit better than yours only because it aligns with my conceptual model and could possibly be a teeny bit better in performance (not detectable in this case, perhaps). Do you see my approach as being "the wrong way" to do it, as if I were thinking about something unrelated to the code, such as navigation or Scarlett Johansson? Thanks. --dawn Received on Fri Feb 16 2007 - 01:11:44 CET

Original text of this message