Re: Can relational alegbra perform bulk operations?

From: Banana <Banana_at_Republic.com>
Date: Wed, 30 Sep 2009 13:03:58 -0700
Message-ID: <4AC3B9AE.4040100_at_Republic.com>


Walter Mitty wrote:
> When code jockeys set out to learn SQL, the hardest thing to get them to
> learn is to refrain from coding in such a way as to gove implicit "hints" to
> the optimizer in the way they express their queries.

You're preaching to the choir here. I do cringe a little when I see all those fancy "hints" which I've never really seen the occasion to use. I'm sure it may make sense in a exceptional or specialized case but that's all it is. Most queries are best well left enough to the engine.

> Getting programmers to think in terms of "what" rather than "how" has been a
> continuing struggle since the 1950s.

Fascinating; it shows that more things change, the more they stay same.

> Matters were made even worse when programmers learned how to manipulate the
> "rules based optimizer" in Oracle, before the days when it had a cost based
> optimizer. The "cool" programmers learned how to tell Oracle how to do its
> job by doing such things as putting a certain table last in the FROM clause.
> All of this distracted from coming up with a good logical design and clear
> coherent code. And "hints" were an even deeper descent into "how" rather
> than "what".

Well, as I agreed with you, one shouldn't really be fiddling with the engine, but as I remarked in an earlier post if we were to port a complex database project from say, SQL Server to Oracle, and even if we didn't use any vendor-specific features, several SQL rewriting may be warranted simply because of performance differences due to the differences of how a SQL statement is parsed & optimized between two engines.

> I can't tell you the number of times I've been approached by an SQL neophyte whose query
> gives incoreect results, and all you have to do is change "SELECT" to
> "SELECT DISTINCT". They'll give a negative response, because they've been
> told that SELECT DISTINCT runs real slow, so they avoid it. Their method is
> to first come up with a solution that runs fast, even if it gives wrong
> answers, and then fix it so that it gives right anwers. It's usually a
> better plan to come up with a solution that gives right answers, and then
> fix it so that it runs fast as well. And in some cases, SELECT DISTINCT
> actaully runs faster than SELECT.

A while back, we discussed the problem of non-matches, of which the traditional answer in SQL is to use a frustrated join to get the nonmatches. However, I found a case where it was actually faster to execute the query using IN() & correlated subquery (which had a notorious reputation for being a drag) than the frustrated join when there were very small or no chance of actual match being found and excluded. Note, though, that this is still implementation so this may not hold for all other DBMS's implementation.

Thus, this kind of situation is quite common, and I do think the SQL literature is filled with many of "myths" which are in fact based on a grain of truth, but there's so many exceptions and "but..." attached to make it hard to apply more or less unconditionally. This may be probably why I went and picked up a book by Date; to learn a bit about relation theory and see why SQL basically ended up the way it is nowadays.

> My point in all of this ramble is that telling the DBMS what to do and
> telling it how to do it are two separate ways of coding.

It shouldn't be forgotten that all abstractions are leaky and SQL is no exception here. It's unfortunate, yes, as the objective was to provide independence from the specifics of performance in retriving a desired set regardless of how it was expressed but the fact is that sometime optimizer fails to recognize a better expression. Those leaks, I suspect, are the motivation for providing hints, even though for most cases people would do well to shun them until it was absolutely necessary.

Sometime I think that a bit more tighter adherence to the relation theory may help to stem in leaks but even so, it won't plug all leaks. No abstraction ever devised has been not leaky but we certainly can make it less leaky than others. Received on Wed Sep 30 2009 - 22:03:58 CEST

Original text of this message