Re: Navigation question

From: dawn <dawnwolthuis_at_gmail.com>
Date: 15 Feb 2007 06:17:54 -0800
Message-ID: <1171549074.486023.59460_at_v33g2000cwv.googlegroups.com>


On Feb 14, 7:24 pm, "Marshall" <marshall.spi..._at_gmail.com> wrote:
> On Feb 14, 4:47 pm, "dawn" <dawnwolth..._at_gmail.com> wrote:
>
> > On Feb 14, 6:15 pm, "Marshall" <marshall.spi..._at_gmail.com> wrote:
>
> > > On Feb 14, 1:59 pm, "dawn" <dawnwolth..._at_gmail.com> wrote:
>
> > > > On Feb 14, 3:32 pm, "Marshall" <marshall.spi..._at_gmail.com> wrote:
> > > > I have not seen any
> > > > applications that execute exactly one or even only (unordered) sets of
> > > > SQL statements,
>
> > > Any two SQL queries where query B is parameterized by
> > > data retrieved by query A can be rewritten into a single
> > > query.
>
> > Yes, but that isn't how software is written, is it?
>
> It's how *I* write software, but yes, people do write
> software using suboptimal techniques. That's why I
> have to carry a fish around--because people write
> queries inside loops in client code that could have
> been a single query.
>
> > Have you ever
> > seen an entire application that does not navigate the database in this
> > way at some point (and for good reason)?
>
> We have a bug tracking application that sucks and has
> few reports and those reports run slow. I wrote a
> reporting application that had a copy of the bug database
> and all the queries were written in the one-query-per-report
> style. Most pages generated in ~100 ms. The longest report
> you could generate was "graph the history of open bugs over
> the entire company from the beginning of time" which included
> a self-join of the table containing all the *changes* to all the
> bugs ever made: it took 9 s.
>
> This was using an OLTP schema, by the way; no
> precomputation was used at all. Eventually the report
> generator was turned into a full-fledged bug tracking
> package, and about half of the company projects have
> converted over to using the new one.
>
> > > The rewritten single query will (absent pathological
> > > behavior in the query optimizer) be at least as fast as
> > > the sum of the two original queries, and possibly faster.
>
> > I'm not sure I believe that, but let's say it is true, then what.
> > What does the application code then do with this cartesian cross-
> > product of data?
>
> I don't think I've ever written a cartesian product into client
> code. I've done a few at the command line, though.

I am understanding this to mean that you do not create or use views used by your code that introduce cross-products, correct?

> > Include just one "property" of an entity where the
> > property has multiple values, such as e-mail addresses. We want the
> > name of the company an the list of contact e-mail addresses (let's say
> > there are 5 for our example), plus the postal address of the HQ, in
> > some "header" block on a data entry page and then a list of orders and
> > dates below it (let's say there are 100). So we retrieve a view that
> > includes one row per e-mail address per order, for 600 rows we need to
> > now deconstruct for the screen. Have you ever seen this done?
>
> I've seen pages that look like that, but not that use cartesian
> product.
> (Although I suppose you could do it that way, and I can even think
> of ways to implement that effeciently. But I wouldn't recommend it.)
>
> But none of what you describe produces any queries that
> have any navigation, or any order dependencies, or that
> are parameterized by information that has to be gotten from
> other, distinct queries.
>
> Let's even do what you describe the hard way, and say that we
> have some unique description of the company, TIN say, that is
> *not* a key of the other tables, nor the primary key of the company
> table.
>
> select name from companies where tin = XXX;

OK, I'll buy that you would not use the term "navigation" to find this node. I still think of this select statement as "going to" the company where tin=XXX and extracting the name, where I suspect you do not. I do not see it as navigating within the database, however, so we are good so far.

> select c.emailaddress from contacts c, company co
> where co.tin = XXX and
> co.companyid = c.companyid;

So instead of retrieving the companyid in the first statement and then narrowing this query down to that precise company, we again "find" the right company by using the same query as before. We include two tables in our query although if we had retrieved the companyid in the previous query we could have used just one. I do not understand how that could yield the same or better performance than

select name, companyid from companies where tin = XXX; mycompany = companyid;
select emailaddress from contacts where companyid = mycompany;

However, once we have the above statement, we are clearly navigating. Is there something about DBMS's today that would yield the same performance between your approach and the one above? Is there something more maintainable in your approach over the one above? What, precisely, makes your approach better? [Please note that I am not suggesting it is not better, just that I do not see the advantage, oh, and don't say that what is better is that you are not "navigating" since I'm trying to figure out what is bad about navigating]

> select a.* from addresses a, company co
> where co.tin = XXX and
> a.companyid = co.companyid and
> a.type = 'HQ';

My perception of this last statement is that it is a navigation (just as I perceived that we "went to" our first node), but I see how instead of using any information retrieved from any prior SQL statement, you are starting only with the tin=XXX information that came in from the user or web service or so.

Would it be accurate to suggest that in order to avoid navigating, the developer never uses any data retrieved from any prior SQL statement as qualifiers for subsequent statements? The "game" would be to use only the data that comes from clients (whether people or other software) to seed the queries. So, you are not suggesting using only one query to populate data on a page, for example (as I was previously understanding from your response), but as many queries as one might have otherwise, but never using any data output from a prior SQL statement to seed another.

When the data goes to the page, including the company id, let's say, then if we read the company id back from the page or keep the company id in memory, it has gone through some sort of purification, perhaps, so then we could use that companyid in a subsequent query and we would not think of that as navigating, is that correct? Or is the fact that we have retrieved the companyid from the database enough to make it off-limits for qualifying subsequent SQL statements?

> select o.orderid, o.date from orders o, company co
> where co.tin = XXX and
> o.companyid = co.companyid;
>
> Note how declarative the last line is: "give me all the
> order numbers and dates of all the orders for the company
> with tin XXX".

Yup, I'm tuning into your thinking on this, I think.

> > I'm
> > interested in whether there are really any significant applications
> > that do no navigation -- do you really think there are?
>
> I can't think of any navigation that I did anywhere in the
> aforementioned bug tracking software.
>
> Marshall

This is very enlightening, thanks. The two lingering questions for me that I've included above are 1) When, if ever, are you are "permitted" (with your approach) to use any data from the DBMS to qualify subsequent SQL statements? Is such data "purified" once it has gone to the user? 2) How is this as efficient as using data, such as foreign key information, to narrow down subsequent queries when it would seem to me that it must re-read that which it has already read?

Thanks a bunch! --dawn Received on Thu Feb 15 2007 - 15:17:54 CET

Original text of this message