Re: Navigation question

From: Kevin Kirkpatrick <kvnkrkptrck_at_gmail.com>
Date: 19 Feb 2007 12:29:47 -0800
Message-ID: <1171916987.061254.110510_at_s48g2000cws.googlegroups.com>


On Feb 16, 9:43 am, "dawn" <dawnwolth..._at_gmail.com> wrote:

> On Feb 16, 8:33 am, "Kevin Kirkpatrick" <kvnkrkpt..._at_gmail.com> wrote:
>
>
>
>
>
> > On Feb 16, 7:53 am, "dawn" <dawnwolth..._at_gmail.com> wrote:
>
> > > On Feb 15, 7:04 pm, "Marshall" <marshall.spi..._at_gmail.com> wrote:
>
> > > > On Feb 15, 4:11 pm, "dawn" <dawnwolth..._at_gmail.com> wrote:
>
> > > > > On Feb 15, 4:51 pm, "Marshall" <marshall.spi..._at_gmail.com> wrote:
>
> > > > > > Fred, Joe, Engelbart, Desmond, and you are all looking
> > > > > > at the SQL and thinking about something else.
>
> > > > > Actually, I'm thinking conceptually (navigation), then modeling
> > > > > logically (in this case with SQL).
>
> > > > You might think you're thinking at the conceptual level, but
> > > > you're not. You're thinking at the implementation level,
> > > > aka physical. Neither the conceptual level nor
> > > > the logical level has any concept of a current
> > > > location, for example.
>
> > > I figured it out -- I'm designing for the USER and when designing at
> > > the logical level, I'm also designing workflow. After pulling up
> > > information about a company, for example, the user goes to one of the
> > > orders listed. From there, they might go to a product that is part of
> > > a line item on the order. This is similar to users who go to a web
> > > site and from there navigate to other web sites. Users navigate.
> > > Their workflow includes a navigation path. So, at the conceptual
> > > level we can navigate, users navigate, and at the physical level we
> > > navigate. But there is one thread of theory within software
> > > development that keeps pushing against navigation, as if it were a bad
> > > thing.
>
> > Perhaps an example would illustrate the difference I see in
> > approaches. Allow me to describe a report I just tuned. First, a
> > blurb about our reporting tool (shipped with the DBMS). Like any out-
> > of-the-box reporting tool, our report designing tool allows the
> > developer to specify one (or more queries) and add all sorts of nifty
> > output formatting. Using a simple GUI, one can specify page breaks
> > and section breaks, each with headers, trailers, etc; one can "add on"
> > aggregation levels just by dragging columns around, and one can
> > specify any number of post-aggregation formatting: order one level
> > aplhabetically, but another by dollar amount; sum dollars at each
> > level - all without modifying the base query. Reports are coded with
> > a GUI that uses boxes to represent queries, aggregations, and
> > computation functions; lines to show how these things relate; and a
> > visual schematic that maps the definition GUI to an output screen.
>
> > The report I needed to tune was coded "navigation style". The
> > comments and layout of the report definition read just like the
> > developer read the requirements and attacked the problem as follows:
> > "the client wants to see company name in the header - so I'll write a
> > query that "goes to" the comany table - Q1; the client wants budget
> > amounts by department, so I'll write a query that "goes to" the
> > department and budget tables, pulling budget amounts for each
> > department, using the company name I just fetched - Q2; the client
> > wants total budgets by company, so I'll write a computation function
> > C1 that adds amounts from Q2 and resets whenenver Q1 "goes to" a new
> > company...(and so on)"
>
> > Before I go on, does this seem like a sound navigational approach to
> > developing such a report?
>
> Maybe, with the navigation coming in the reading of budget information
> for each department, using the company name to navigate to these.
>
Huh?

> I work with a database where you can read in "header" data and have a
> set of foreign key pointers to child data (not a traditional SQL-
> DBMS). So, I can see the code reading the company data, then using
> the set of foreign key pointers to "go to" the budget amounts for the
> departments. The totals and subtotals (aggregation) are not something
> I would put in the category of navigation. But for an SQL-DBMS, I
> would think of going the other direction (irrelevant to the question
> at hand, but thought it was worth noting).
>

For an SQL-DBMS, you shouldn't "think of going" in any direction. Ever. You will be wrong. Your intution will fail, and in most cases, miserably so. Even if you code a decent execution strategy for today's data given today's resource availability, the data will change, the resources will change, and your strategy will become obsolete; you'll probably have to program a different execution strategy, maybe next month, maybe next year...

You use SQL to tell the database, "here is *everything* I want". The optimizer will look at statistics of the data, cost of index usage, cost of table scans, likelihood of data being cached, the system's resource availability and usage, and a myriad of other things to decide how best to retrieve that information. You cannot compete. Even Bob Badour, not-so-humble-person that he is, would bow to the optimizer in deciding the best approach for running a query.

My condolences if you are stuck with a database which forces developers to make these kinds of decisions.

> Interesting that your bring up reporting, since I don't think
> navigation -style related to reporting, in general, so I'll have to
> figure out why that is. I guess I do when we are talking about OLAP
> where there is an interaction and the user "goes from" here to there,
> slicing and dicing and drilling down (navigating) to greater
> granularity. So, the fact that the user is not updating anything and
> that we are not talking about work-flow with a static report is likely
> the reason I do not think in terms of navigation. But, as you
> indicate, the code can still use results from one query and use them
> for another query.
>

You are imagining distinctions where none exist. In any application, when you need data from the database, you should request all of the data you need and only the data you need. In my reporting example, yes, it was possible/necessary to get all of the information needed by the application "up front". Had the report needed interactive drilldown  functionality, I might want to fetch all of the high-level company information "up front", but only fetch budget details for specific companies based on user requests. In this case, I'd need two queries: one that says "retrieve the high-level data" and another that says "retrieve the beudget details the user-selected company" (based on logical foreign keys, not physical memory pointers!).

Notice - there is still no specification of any navigation. When my application selects the high-level information, I don't care if the high level information is on 1, 2, or 3 tables, and I certainly don't care about the order in which these tables are "visited". Same argument holds for the detail tables. I am simply requesting all of the information I need when I need it.

>
>
>
> > The report had 8 queries - one base query and 7 "contingent" queires
> > parameterized by various values selected in the base query (an
> > important point I'll revisit below: each of the 8 queries was
> > individually tuned - they all used appropriate indexes based on their
> > respective input parameters). Over a dozen custom "compuation
> > functions" were scattered throughout the report definition (since
> > multiple queries were being used, all aggregation was done "by
> > hand"). These computation functions were essentially local variables
> > with explicit assignment statements; their use is in stark contrast to
> > the implicit aggregation functionality supplied with the report
> > product itself. The definition of the untuned report had 8 query
> > boxes, 4 aggregation boxes, and 14 compuation boxes spread out over 3
> > screens.
>
> > I, for one, did not think this was a good approach to the problem. In
> > fact, after a quick perusal, I closed the original report definition
> > never opened it again. I read the client's original requirements plus
> > the few changes requested over the years, and wrote a single query
> > that pulled the ~15 fields of interest from a single query joining ~10
> > different tables (in the definition file, this is 1 box). With 5
> > mouse clicks, I broke the rsults into 5 levels of aggregation - so I
> > now had 5 boxes - then placed sum() and count() fields within these
> > aggregation boxes. I drew up an output schematic directing to the
> > appropriate locations of ~20 fields, then compiled and tested my
> > version.
>
> > I did not get a single compilation error - everything just
> > "worked" (not surprising, because it really was as simple as what I
> > just described).
>
> This might be an example of removing navigation, but it also sounds
> like a) possibly removing unnecessary added queries/combining queries
> and b) using higher level tools so that any navigation is not
> specified by the developer. I do like reporting tools, and I don't
> think of coding in "navigation" when I use them.
>

YES! I did use a "higher level tool". The "higher level tool" I used to avoid specifying the navigation is called a RDBMS. Even if I hadn't had a reporting tool, I still would have used *one* query, and I still would not be "navigating from this table to that table".

> > Our reports are compiled into report executables -
> > the old executable was 130k, the new one was 10k. The old report was
> > running in about 1.5 hours at night; but had run for upwards of 4
> > hours during peak business hours (when the OLTP applications are
> > hammering away at the indexes). My version ran in 40 minutes on our
> > development box, and in production it ran 25 minutes at night and 35
> > minutes during the day.
>
> Great!

Even better - we have the freedom to tell the optimizer "feel free to use concurrency resources to speed this up". At this time, it's a lowpriority  report and these resources are limitted, so we haven't taken this step. However, testing showed that enabling parallelism for the report will reduce runtimes to as little as 5 minutes, no coding changes needed. Such a reduction could ony be approached (and then, only with massive code changes) with a "navigational" version of the program.

It's all part of the beauty of telling the database "what I want" and leaving the "how to get it" up to the optimizer.

>
> > Out of curiosity, I reviewed my base query's execution plan: it used
> > only one index on one table; all other tables were full-scanned and
> > hash-joined. Of course, having taken a declarative approach, I didn't
> > really need to see what indexes were used or what order the tables'
> > data were read (since I never even considered the problem as a logical
> > "navigational" problem;
>
> and, for the record, in case it wasn't clear, this is not where
> "navigation" kicks in for me either. I think sets, sets, sets when
> using reporting tools.
>

Try thinking "sets, sets, sets" when writing applications. Don't think of reports as a "special class" of applications - think of them as part of a continuum of applications, all of which may need different information at different times. A single prinicple applies across this continuum: use queries that retrieve all of the data your application needs and only the data your application needs.

> > I didn't really care what table was visited
> > first, second, or third; any more than a java developer would care
> > about the byte-code generated from compiling a java program). It was,
> > however, interesting to note that the optimizer actually "went to" the
> > company table last - exactly opposite of the intuitve navigational
> > decision made by the original developer.
>
> but oddly similar to my intuition. Did the reporting tool make
> multiple SQL queries?

Of course not. The tool just ran the query I specified, which returned all of the raw data needed by the report.

The fact that you had an intution is telling. On what did you base your intuition? (this is not a rhetorical question)

>
> > What approach to these types of reports or application designs is
> > generally better - a navigational approach "go to t1 and get company,
> > now go to t2 and get department, then use those to go to t3 for
> > budgets", or declarative: "join tables t2, t3, t1 and return company,
> > department, and budgets"?
>
> This is ill-thought-out, off the top of my head --
> If there is no reporting tool, so that you cannot simply declare the
> entire report and are writing the database access statements for the
> DBMS to execute, then you have both the issue of retrieving the data
> and presenting it. I admit I haven't written a sophisticated report
> with break points, header and footer data, multi-line addresses that
> are stacked, but do not take up more space than necessary, allowing
> other data next to these lines, etc without a reporting tool in quite
> some time, but something in the mix needs to take the result sets from
> the queries and turn it into the presentation. If the query language
> is incapable of doing that, then, hmmm, I'm thinking the reporting
> tool does iterations under the covers, so I might have to as well?
> That isn't the same as navigation, but within the iterations, one
> might navigate, perhaps?
>

Yes, the reporting tool automates the coding of iterations and aggregation over the result set of the query. Lacking a reporting tool, you'd need your application to do the same; but think of it this way: once I declaratively specified exactly the information I needed, the code for processing the results was so straightforward that it could be *generated*.

Without the reporting tool, but still starting with a single declarative SQL query, my program would have been mind-numbingly easy to write. The reporting tool does *not* divvy up the query and run it piecemal to "go to this table" then "go to that table" (that's the "hard", "error-prone", "performance bug waiting to happen" approach) - applications should not do this either.

> Thanks, Kevin. I was using an example of interactive data maintenance
> applications, so thinking about read-only is helpful. But it does
> seem like the reporting tools have removed any need/desire for "me" to
> think about navigation for reports. cheers! --dawn- Hide quoted text -
>

Again, don't think about navigation for *any* applications; reports, interactive, or otherwise. Think sets, always. State what you need when you need it, and move on. Received on Mon Feb 19 2007 - 21:29:47 CET

Original text of this message