Re: Can relational alegbra perform bulk operations?

From: Walter Mitty <>
Date: Wed, 30 Sep 2009 18:44:16 GMT
Message-ID: <4GNwm.193$>

"Banana" <> wrote in message
> Clifford Heath wrote:
>> I believe you've had a good answer in the existence of NULLs and
>> duplicates,
> Indeed, Date gave a good discussion about why neither made sense in
> relations.
>> but I don't know that SQL is *that* far. The relational algebra doesn't
>> cover the most common four ways to implement an inner join;
>> implementation
>> isn't its concern. But choice of looping, merge joins, hash joins, etc,
>> definitely affects query performance.
> Yes, I may have had picked up mistaken impression from the same book
> thinking that relational model had something to say about optimizing the
> queries. It doesn't look to be the case here, and would definitely explain
> why SQL is fairly divorced from relational theory.

Actually, SQL itself has relatively little to say about query optimization.

When code jockeys set out to learn SQL, the hardest thing to get them to learn is to refrain from coding in such a way as to gove implicit "hints" to the optimizer in the way they express their queries.

Getting programmers to think in terms of "what" rather than "how" has been a continuing struggle since the 1950s. When FORTRAN was first launched on the scene, a whole generation of 2GL programmers made it their goal to write their FORTRAN in such a way that the compiler would discover an efficient machine language equivalent. That might be a noble goal in and of itself, but when it interferes with writing coherent and flexible code, it's very counterprodictive. When virtual memory came on the scene, another whole generation spent more time than they should have anticipating program memory references so as to minimize paging. There is a small class of programming problems where this kind of thing makes sense. There is a much larger class where it does not.

Matters were made even worse when programmers learned how to manipulate the "rules based optimizer" in Oracle, before the days when it had a cost based optimizer. The "cool" programmers learned how to tell Oracle how to do its job by doing such things as putting a certain table last in the FROM clause. All of this distracted from coming up with a good logical design and clear coherent code. And "hints" were an even deeper descent into "how" rather than "what".

Not only that, but there are a lot of junior SQL jockeys who have a completely clouded idea of how a cost based optimizer works, and try to "help" the optimizer in ways that are useless or even harmful. I can't tell you the number of times I've been approached by an SQL neophyte whose query gives incoreect results, and all you have to do is change "SELECT" to "SELECT DISTINCT". They'll give a negative response, because they've been told that SELECT DISTINCT runs real slow, so they avoid it. Their method is to first come up with a solution that runs fast, even if it gives wrong answers, and then fix it so that it gives right anwers. It's usually a better plan to come up with a solution that gives right answers, and then fix it so that it runs fast as well. And in some cases, SELECT DISTINCT actaully runs faster than SELECT.

In another case, I saw where a program specified in a WHERE clause, the following: "STATE = progamvariablestate AND CITY = programvariablecity". When I asked the programmer why he didn't specify the country, he told me that the country was always USA and he wanted to keep things simple for the DBMS. After looking at the indexes, I was able to speed up the query by changing the criterion to "COUNTRY = 'USA' AND STATE = progamvariablestate AND CITY = programvariablecity". The delay time dropped from about 6 minutes to less than 2 seconds. (for extra credit: can you guess what I saw in the indexes?)

My point in all of this ramble is that telling the DBMS what to do and telling it how to do it are two separate ways of coding. Received on Wed Sep 30 2009 - 20:44:16 CEST

Original text of this message