Re: Navigation question

From: Kevin Kirkpatrick <kvnkrkptrck_at_gmail.com>
Date: 16 Feb 2007 06:33:10 -0800
Message-ID: <1171636390.149952.315010_at_a75g2000cwd.googlegroups.com>


On Feb 16, 7:53 am, "dawn" <dawnwolth..._at_gmail.com> wrote:
> On Feb 15, 7:04 pm, "Marshall" <marshall.spi..._at_gmail.com> wrote:
>
> > On Feb 15, 4:11 pm, "dawn" <dawnwolth..._at_gmail.com> wrote:
>
> > > On Feb 15, 4:51 pm, "Marshall" <marshall.spi..._at_gmail.com> wrote:
>
> > > > Fred, Joe, Engelbart, Desmond, and you are all looking
> > > > at the SQL and thinking about something else.
>
> > > Actually, I'm thinking conceptually (navigation), then modeling
> > > logically (in this case with SQL).
>
> > You might think you're thinking at the conceptual level, but
> > you're not. You're thinking at the implementation level,
> > aka physical. Neither the conceptual level nor
> > the logical level has any concept of a current
> > location, for example.
>
> I figured it out -- I'm designing for the USER and when designing at
> the logical level, I'm also designing workflow. After pulling up
> information about a company, for example, the user goes to one of the
> orders listed. From there, they might go to a product that is part of
> a line item on the order. This is similar to users who go to a web
> site and from there navigate to other web sites. Users navigate.
> Their workflow includes a navigation path. So, at the conceptual
> level we can navigate, users navigate, and at the physical level we
> navigate. But there is one thread of theory within software
> development that keeps pushing against navigation, as if it were a bad
> thing.
>
>
>

Perhaps an example would illustrate the difference I see in approaches. Allow me to describe a report I just tuned. First, a blurb about our reporting tool (shipped with the DBMS). Like any out- -the-box reporting tool, our report designing tool allows the developer to specify one (or more queries) and add all sorts of nifty output formatting. Using a simple GUI, one can specify page breaks and section breaks, each with headers, trailers, etc; one can "add on" aggregation levels just by dragging columns around, and one can specify any number of post-aggregation formatting: order one level aplhabetically, but another by dollar amount; sum dollars at each level - all without modifying the base query. Reports are coded with a GUI that uses boxes to represent queries, aggregations, and computation functions; lines to show how these things relate; and a visual schematic that maps the definition GUI to an output screen.

The report I needed to tune was coded "navigation style". The comments and layout of the report definition read just like the developer read the requirements and attacked the problem as follows: "the client wants to see company name in the header - so I'll write a query that "goes to" the comany table - Q1; the client wants budget amounts by department, so I'll write a query that "goes to" the department and budget tables, pulling budget amounts for each department, using the company name I just fetched - Q2; the client wants total budgets by company, so I'll write a computation function C1 that adds amounts from Q2 and resets whenenver Q1 "goes to" a new company...(and so on)"

Before I go on, does this seem like a sound navigational approach to developing such a report?

The report had 8 queries - one base query and 7 "contingent" queires parameterized by various values selected in the base query (an important point I'll revisit below: each of the 8 queries was individually tuned - they all used appropriate indexes based on their respective input parameters). Over a dozen custom "compuation functions" were scattered throughout the report definition (since multiple queries were being used, all aggregation was done "by hand"). These computation functions were essentially local variables with explicit assignment statements; their use is in stark contrast to the implicit aggregation functionality supplied with the report product itself. The definition of the untuned report had 8 query boxes, 4 aggregation boxes, and 14 compuation boxes spread out over 3 screens.

I, for one, did not think this was a good approach to the problem. In fact, after a quick perusal, I closed the original report definition never opened it again. I read the client's original requirements plus the few changes requested over the years, and wrote a single query that pulled the ~15 fields of interest from a single query joining ~10 different tables (in the definition file, this is 1 box). With 5 mouse clicks, I broke the rsults into 5 levels of aggregation - so I now had 5 boxes - then placed sum() and count() fields within these aggregation boxes. I drew up an output schematic directing to the appropriate locations of ~20 fields, then compiled and tested my version.

I did not get a single compilation error - everything just "worked" (not surprising, because it really was as simple as what I just described). Our reports are compiled into report executables - the old executable was 130k, the new one was 10k. The old report was running in about 1.5 hours at night; but had run for upwards of 4 hours during peak business hours (when the OLTP applications are hammering away at the indexes). My version ran in 40 minutes on our development box, and in production it ran 25 minutes at night and 35 minutes during the day.

Out of curiosity, I reviewed my base query's execution plan: it used only one index on one table; all other tables were full-scanned and hash-joined. Of course, having taken a declarative approach, I didn't really need to see what indexes were used or what order the tables' data were read (since I never even considered the problem as a logical "navigational" problem; I didn't really care what table was visited first, second, or third; any more than a java developer would care about the byte-code generated from compiling a java program). It was, however, interesting to note that the optimizer actually "went to" the company table last - exactly opposite of the intuitve navigational decision made by the original developer.

What approach to these types of reports or application designs is generally better - a navigational approach "go to t1 and get company, now go to t2 and get department, then use those to go to t3 for budgets", or declarative: "join tables t2, t3, t1 and return company, department, and budgets"?

>
>
> > > > It is worth mentioning that 1) programmers are
> > > > almost always of the belief that they understand
> > > > the performance characteristics of their code
> > > > even without measuring it, and that 2) they are
> > > > almost always wrong.
>
> > > That is why mgmt or old ladies might say something like "build
> > > security in, be smart about performance, but let's address what we
> > > really find as the performance problems in future iterations, before
> > > we go live, of course, rather than doing any handstands based on
> > > speculation." We should not be careless, however, as there are still
> > > many projects that fail for reasons of performance.
>
> > It is standard practice where I work to test for scalability and
> > performance prior to the launch of a product. One doesn't know
> > anything until one measures.
>
> Yup.
>
> > > > The compiler/VM/query optimizer
> > > > can do better than a human can, almost always, and
> > > > it never gets distracted by thinking about navigation
> > > > or Scarlett Johansson.
>
> > > If it is as bad to pop up to a conceptual level when coding as to
> > > think about Scarlett Johansson, then I'll try to think about her
> > > instead too. I don't quite catch the fun of doing so however, so clue
> > > me in ;-)
>
> > For futher investigation I suggest Google Image Search.
> > However there may be a hardware issue in your case.
>
> ;-)
>
> > > > So, you could make an argument about there being a difference; in
> > > > practice I would be astonished if there was one.
>
> > > Hmm. Ok. But is there something wrong with doing the "navigation"
> > > that I did with my statements?
>
> > In this particular simplified case it doesn't matter one way or
> > the other, from a performance standpoint. In a real-world
> > scenario, it may matter a lot.
>
> Agreed. It might then be better one way or the other. I asked if
> there was something wrong with doing the "navigation" that I did and
> you said that in this case it didn't matter one way or the other, but
> then added "from a performance standpoint." So, does it matter one
> way or another from some other standpoint?
>
> > > > Index scans are ridiculously fast. Any difference is speed has
> > > > been less than the difference from one query run to the next;
> > > > it's in the noise. And the more complicated the query is, the
> > > > harder your technique will be and the proprtionally smaller any
> > > > already small effect the removal of the index scan will be.
>
> > > But my technique was not difficult at all and I wasn't in a big,
> > > conscious way attempting to save on performance, it was a by-product
> > > of the approach instead.
>
> > There was no by-product; you didn't save on performance. Furthermore
> > as Tony mentioned you have now stamped your query plan into
> > concrete,
>
> No more than either of your SQL statements did. I provided
> information I had to the SQL statements. The statement decided how to
> execute.
>
> > and made it impossible for the system to adapt its
> > execution to changing circumstances.
>
> How did I do that in a way that you did not with your queries? I
> provided different criteria, but there is nothing worse about the
> criteria I provided, is there?
>
> > > At each "node" I use set-based logic to get
> > > the desired result sets. If I have information to pass to an SQL
> > > query from a prior query, I use it to "navigate" to that next node or
> > > set of nodes.
>
> > Yes you did. You may hereby consider yourself as having
> > been smacked by a salmon.
>
> Because I did something that is wrong according to some religion? Or
> by what criteria? I'm still not hearing what, precisely, is amiss
> with navigating.
>
> Additionally, it seems that it is OK to "navigate" as long as the data
> goes back to the user and gets new user input, so that if the user
> selects an order, one we put to the screen from a result set, then it
> is OK to navigate to that order (as long as we don't think of it that
> way as we wouldn't want to think like the user, perhaps?). So I would
> think that if some other event occured to provide such input, we could
> also navigate based on that input.
>
> > > Whether working with VSAM files, Pick, or SQL-DBMS's,
> > > this approach has worked and has never been contested by anyone as
> > > being either difficult to maintain or to perform poorly.
>
> > If your tables are a hundred rows it makes no noticable perfomance
> > differences if you fetch them one at a time or all at once. There's
> > no perceptible difference between an O(n log n), O(n^2), or O(n^3)
> > algorithm
> > if n is small enough. As n grows, so does the difference.
>
> Please zero in on the specific topic of navigation. I do understand
> the set processing concerns and iteration. That is NOT the topic, nor
> does the topic need to navigate there.
>
> > If you have a three level deep static hierarchy, with 10^4 rows in
> > the outer table and 10^4 fanout at each level, and you want to
> > aggregate across the lowest level grouped by the highest level,
> > the difference between doing a sum in a single query and a nested
> > lookup execution is a factor of 10^8 network bandwidth and
> > 10^12 packet count. It can mean the differerence between
> > getting an answer in 5 seconds vs. days.
>
> > I have encountered much larger problems in the field; this
> > is not even extreme.
>
> > Marshall
>
> This is not about network bandwidth. I've been doing client-server
> development for quite some time and am fully aware of round-trip
> costs. Please note that I did no more round trips with my example
> than you did with yours. I navigated, however, where one could
> suggest that you did not. --dawn- Hide quoted text -
>
> - Show quoted text -
Received on Fri Feb 16 2007 - 15:33:10 CET

Original text of this message