Re: Navigation question

From: dawn <dawnwolthuis_at_gmail.com>
Date: 15 Feb 2007 12:15:25 -0800
Message-ID: <1171570525.632970.51920_at_s48g2000cws.googlegroups.com>


On Feb 15, 10:58 am, "Marshall" <marshall.spi..._at_gmail.com> wrote:
> On Feb 15, 6:17 am, "dawn" <dawnwolth..._at_gmail.com> wrote:
>
> > On Feb 14, 7:24 pm, "Marshall" <marshall.spi..._at_gmail.com> wrote:
>
> > > > What does the application code then do with this cartesian cross-
> > > > product of data?
>
> > > I don't think I've ever written a cartesian product into client
> > > code. I've done a few at the command line, though.
>
> > I am understanding this to mean that you do not create or use views
> > used by your code that introduce cross-products, correct?
>
> Embarrassing admission: I've never used views of any kind.
> My professional career has not included using higher end
> dbms products. :-(

That's OK. I did most of my SQL work in BI, so we saved all such queries as views. I also mentioned to Tony that I realize I was using the term incorrectly too, since I was thinking of the result set as a "view" of the data, which does not mean it is a query against an sql view. We each just know what we know, eh?

> However a view is not a lot more than a named query, as
> I understand it.

Yes, an SQL View is a named query (and you have no idea how many times in my career I have said that, so it is embarrassing that I was using the term in a non-SQL way)

> And I can't think of when I'd have used
> full cross products.

Good. When I hear people talk about screens/forms/pages as "updatable views" (in my world) I remind them that is not an acceptable way to think of them in the sql world. If you get all of the data for a single page from your SQL-DBMS in one query (as we often do in Pick), you are likely to have a cross-product (header x detail, for example), which is unnecessarily ugly to work with. We can make one query with a multivalue dbms to get all the data without introducing the duplication of data that SQL would do. Did that make sense?

> Not that there's any particular reason
> not to; there's no reason why they have to be any less
> efficient than doing the two subqueries yourself. In fact
> a guy at work was telling me the other day about a dbms
> optimization he'd seen where the system would transmit
> two relations instead of one and stitch them together in
> the dbms client code (not the application client code.)
>
> > > Let's even do what you describe the hard way, and say that we
> > > have some unique description of the company, TIN say, that is
> > > *not* a key of the other tables, nor the primary key of the company
> > > table.
>
> > > select name from companies where tin = XXX;
>
> > OK, I'll buy that you would not use the term "navigation" to find this
> > node. I still think of this select statement as "going to" the
> > company where tin=XXX and extracting the name, where I suspect you do
> > not. I do not see it as navigating within the database, however, so
> > we are good so far.
>
> A guy Fred at work thinks of this statement as x86 machine code,
> and Joe thinks of it all as electrons whirling around silicon
> pathways.
> Engelbart says they are both wrong and that it's all subatomic
> particles,
> while Desmond claims they are all thinking at the wrong level; the
> truth
> we can perceive is just shadows on Plato's cave.

Yup, so conceptually, I'm navigating and there is no problem with that, right? If I am understanding correctly, you are suggesting that typically when someone is speaking against "navigation" they are fine with using "navigation" as a metaphor, as a way of conceptualizing what it happening. I believe you have also suggested that it is fine to talk about navigation when discussing what is physically happening. So the problems with navigation are above the physical and below the conceptual, in the application code, for example. Understood.

> However the statement itself doesn't have any "going to" in it.
>
> > > select c.emailaddress from contacts c, company co
> > > where co.tin = XXX and
> > > co.companyid = c.companyid;
>
> > So instead of retrieving the companyid in the first statement and then
> > narrowing this query down to that precise company, we again "find" the
> > right company by using the same query as before. We include two
> > tables in our query although if we had retrieved the companyid in the
> > previous query we could have used just one. I do not understand how
> > that could yield the same or better performance than
>
> > select name, companyid from companies where tin = XXX;
> > mycompany = companyid;
> > select emailaddress from contacts where companyid = mycompany;
>
> The canonical case to consider is client-server.
>
> Database performance is not my strong area, however it's easy
> enough to see that in my case, there will be an indexed lookup
> on tin that gets companyid, and then whatever mechanism
> (table scan?) will select all the email contacts based on that
> company id, in a single network round trip. Whereas your
> approach will do the indexed lookup in one network packet,
> then the client will issue a second request that is the same
> as the second part of my query.

No, no -- I was replacing your two SQL statements with my two SQL statements. There are the same number of round trips. Please review this and reply, if you would. Thanks. --dawn

> So the cost to the dbms
> will be identical, and the cost in network traffic will be two
> packets vs. one, which may well double the overall runtime.
>
> I have observed much, much worse differences between
> the one query form and the split up query form in the
> field. A factor of 100 worse is commonplace; a factor
> of 10,000 worse is not unheard of.
>
> > What, precisely, makes your approach better? [Please note that I am
> > not suggesting it is not better, just that I do not see the advantage,
> > oh, and don't say that what is better is that you are not "navigating"
> > since I'm trying to figure out what is bad about navigating]
>
> My first thought is overall application performance, because I
> often have to care about that, and about scalability of large
> systems. I'm sure others can supply other reasons. But
> I am hard pressed to think of a good argument for doing
> multiple queries where one will suffice.
>
> > Would it be accurate to suggest that in order to avoid navigating, the
> > developer never uses any data retrieved from any prior SQL statement
> > as qualifiers for subsequent statements? The "game" would be to use
> > only the data that comes from clients (whether people or other
> > software) to seed the queries. So, you are not suggesting using only
> > one query to populate data on a page, for example (as I was previously
> > understanding from your response), but as many queries as one might
> > have otherwise, but never using any data output from a prior SQL
> > statement to seed another.
>
> I don't really want to get bogged down into terminological
> discussions,
> but I will say that pursuing the fewest-queries approach makes for
> the fastest, most scalable software.
>
> > When the data goes to the page, including the company id, let's say,
> > then if we read the company id back from the page or keep the company
> > id in memory, it has gone through some sort of purification, perhaps,
> > so then we could use that companyid in a subsequent query and we would
> > not think of that as navigating, is that correct?
>
> The *user's* navigation through the pages or screens or dialogs of the
> client application are at another level entirely than what we are
> talking
> about.
>
> > Or is the fact that
> > we have retrieved the companyid from the database enough to make it
> > off-limits for qualifying subsequent SQL statements?
>
> Suddenly I notice that you phrase many things in terms of what's
> allowed or forbidden or off-limits or what have you. I wonder why
> that is?
>
> > This is very enlightening, thanks. The two lingering questions for me
> > that I've included above are 1) When, if ever, are you are
> > "permitted" (with your approach) to use any data from the DBMS to
> > qualify subsequent SQL statements? Is such data "purified" once it
> > has gone to the user? 2) How is this as efficient as using data, such
> > as foreign key information, to narrow down subsequent queries when it
> > would seem to me that it must re-read that which it has already read?
>
> I can't really make sense out of question 1. I hope that for question
> 2
> I've given some indication above. (And I wouldn't say that minimizing
> the query count is "as efficient"; I'd say it's "way more efficient.")
>
> Marshall
Received on Thu Feb 15 2007 - 21:15:25 CET

Original text of this message