Re: Navigation question
Date: 16 Feb 2007 08:34:37 -0800
Message-ID: <1171643677.190230.214370_at_v45g2000cwv.googlegroups.com>
On Feb 16, 9:48 am, "Marshall" <marshall.spi..._at_gmail.com> wrote:
> On Feb 16, 5:53 am, "dawn" <dawnwolth..._at_gmail.com> wrote:
>
Only a second to scan before heading out, but it appears that you are still thinking I'm splitting queries and making more round trips and I am not. Please re-read the example you gave and the one I gave. They have the same number of queries. How many times do I have to say that I am not, not, not, absolutely not suggesting that we do more round trips or split queries. Can you point me to where I have apparently mistakenly suggested such at thing? --dawn
>
> > On Feb 15, 7:04 pm, "Marshall" <marshall.spi..._at_gmail.com> wrote:
>
> > > You might think you're thinking at the conceptual level, but
> > > you're not. You're thinking at the implementation level,
> > > aka physical. Neither the conceptual level nor
> > > the logical level has any concept of a current
> > > location, for example.
>
> > I figured it out -- I'm designing for the USER and when designing at
> > the logical level, I'm also designing workflow. After pulling up
> > information about a company, for example, the user goes to one of the
> > orders listed. From there, they might go to a product that is part of
> > a line item on the order. This is similar to users who go to a web
> > site and from there navigate to other web sites. Users navigate.
> > Their workflow includes a navigation path. So, at the conceptual
> > level we can navigate, users navigate, and at the physical level we
> > navigate. But there is one thread of theory within software
> > development that keeps pushing against navigation, as if it were a bad
> > thing.
>
> I'm sorry but this is totally beside the point. Neither users' mental
> model nor application architecture has any determining role
> in how a query is written.
>
> > > > > So, you could make an argument about there being a difference; in
> > > > > practice I would be astonished if there was one.
>
> > > > Hmm. Ok. But is there something wrong with doing the "navigation"
> > > > that I did with my statements?
>
> > > In this particular simplified case it doesn't matter one way or
> > > the other, from a performance standpoint. In a real-world
> > > scenario, it may matter a lot.
>
> > Agreed. It might then be better one way or the other. I asked if
> > there was something wrong with doing the "navigation" that I did and
> > you said that in this case it didn't matter one way or the other, but
> > then added "from a performance standpoint." So, does it matter one
> > way or another from some other standpoint?
>
> There are complex cognitive and productivity differences that are
> subtle and hard to think about. However the performance argument
> is clear-cut, so I'm focusing on that.
>
>
>
> > > > > Index scans are ridiculously fast. Any difference is speed has
> > > > > been less than the difference from one query run to the next;
> > > > > it's in the noise. And the more complicated the query is, the
> > > > > harder your technique will be and the proprtionally smaller any
> > > > > already small effect the removal of the index scan will be.
>
> > > > But my technique was not difficult at all and I wasn't in a big,
> > > > conscious way attempting to save on performance, it was a by-product
> > > > of the approach instead.
>
> > > There was no by-product; you didn't save on performance. Furthermore
> > > as Tony mentioned you have now stamped your query plan into
> > > concrete,
>
> > No more than either of your SQL statements did. I provided
> > information I had to the SQL statements. The statement decided how to
> > execute.
>
> Apparently you missed Tony's point completely. I'm going to snip
> out a bunch and focus on just one case.
>
> > > > Whether working with VSAM files, Pick, or SQL-DBMS's,
> > > > this approach has worked and has never been contested by anyone as
> > > > being either difficult to maintain or to perform poorly.
>
> > > If your tables are a hundred rows it makes no noticable perfomance
> > > differences if you fetch them one at a time or all at once. There's
> > > no perceptible difference between an O(n log n), O(n^2), or O(n^3)
> > > algorithm
> > > if n is small enough. As n grows, so does the difference.
>
> > Please zero in on the specific topic of navigation. I do understand
> > the set processing concerns and iteration. That is NOT the topic, nor
> > does the topic need to navigate there.
>
> I *am* talking about navigation. In particular I'm pointing out how
> your way of doing it doesn't scale. Splitting queries = bad,
> for performance reasons, not religious ones.
>
> > > If you have a three level deep static hierarchy, with 10^4 rows in
> > > the outer table and 10^4 fanout at each level, and you want to
> > > aggregate across the lowest level grouped by the highest level,
> > > the difference between doing a sum in a single query and a nested
> > > lookup execution is a factor of 10^8 network bandwidth and
> > > 10^12 packet count. It can mean the differerence between
> > > getting an answer in 5 seconds vs. days.
>
> > > I have encountered much larger problems in the field; this
> > > is not even extreme.
>
> > This is not about network bandwidth. I've been doing client-server
> > development for quite some time and am fully aware of round-trip
> > costs.
>
> Network bandwidth is one strong reason why splitting up
> queries is a terrible idea.
>
> > Please note that I did no more round trips with my example
> > than you did with yours. I navigated, however, where one could
> > suggest that you did not.
>
> Okay. Let's make the example really simple and concrete.
> Please try to focus on the specific example I'm pointing
> out.
>
> We have a company that's been around for a while.
> They have a lot of long-standing customers. Customer
> orders tend to be fairly detailed, with lots of little parts.
>
> The customers table has 100 rows.
> The invoices table has 10,000 rows; customers average
> 100 invoices.
> The invoice line items table has 1,000,000; invoices
> average 100 line items.
>
> The head sales guy says give me a report with
> every customer id and their total spend, and the
> one part number they bought the most of.
>
> We can take a navigational approach: go to each
> customer; get each of their invoices; get each
> invoice line item. If the count is greater than
> the max for this customer, set it. Add the cost.
>
> In pseudocode:
>
> ccount = count of customers
> declare spend[ccount] = all zeroes
> declare max[ccount] = all zeroes
> declare item[ccount] = all zeroes
> fetch all customer ids
> for each customer c
> fetch all invoices ids for customer c
> for each invoice i
> fetch all lineitem count, cost, item for i
> for each lineitem l
> spend[c] = spend[c] + l.count * l.cost
> if (l.count > max[c])
> item[c] = l.item
> end
> end
> end
>
> Or, in sql:
>
> select c.customerid, sum(l.count*l.cost), max(l.count)
> from customers c natural join
> invoices i natural join
> lineitems l
>
> You say you're fully aware of network cost issues.
> It should take about a minute to estimate the number
> of network round-trips, and the total byte count
> transferred. Let's just assume every quantity is 32 bits.
> Compare the round-trip count and byte count for
> the two approaches. Let us know what you conclude.
>
> Marshall
Received on Fri Feb 16 2007 - 17:34:37 CET