Re: joins and performances : general question

From: Mariano Corral <corral_at_iname.com>
Date: Mon, 22 Nov 1999 13:05:31 -0800
Message-ID: <000b8d9b.c5f28e05@usw-ex0107-043.remarq.com>

"Christian Bruno" <brunoc_at_ifrance.com> wrote:
> hello,
> it is a very general question :
> i have to write sql queries using lot of Joins (mainly left outer
> and inner
> joins)
> is there rule(s) to apply to write these sql orders, having
> knowledge of the
> average number of rows in my tables ? for example : which order
> to use for
> joins in a particular query ? is it better to begin or to end with
> the join
> that retrieves the more rows in the database ? it is better to use
> INNER
> joins than LEFT OUTER JOINS ?

My comments below apply to Sybase ASE.

ASE used to have only an algorithm for joins: the nested loop. In IBM's DB2, that used to be the most used of the various join methods available. It's a straightforward algorithm: from each selected row from the external table, the RDMBS selects all matching rows from the internal table.

Let's see it with an equi-join:
select t1.col1, t2.col2 from a, b where t1.col3 = 7 and t2.col4 = 9 and t1.col1 = t2.col1

Since it's an equi-join, ASE (and probably every RDBMS) evaluates which table is best as external. Let's assume it chooses t1. Then, the first thing ASE does is to select all t1 rows where t1.col3 = 7. In other words, for the external table, ASE applies the "filtering" predicates (I speak about filtering as opposed to "join predicates") for table t1.

For each one of those t1 rows, ASE selects all t2 rows where t2.col4 = 9 and t2.col1 = (col1 of the t1 row being joined). Remember, this query is applied to every t1 row selected before. But now, the query is much more selective and may use the indexes in a better way, since ASE is applying both "filtering" predicates on t2 (ts.col4 = 9) and "join" predicates (t1.col1 = t2.col1)

Hence, this join algorithm performs OK when the external table has few rows selected by the "filtering" predicates (and this selection doesn't have a huge cost), and the internal table is either very small or has a good index to apply the "filtering" and "join" predicates.

Why?

First, there should be few rows selected from the external because a select is launched against the internal for each of them. The fewer external rows, the fewer nested selects.

And second, those nested selects should be very efficient since there are several (perhaps many) of them. On the other hand, the selection of external rows can be less efficient since the external table is queried only once.

But all the above is for equi-joins. With outer joins, one table (outer) is joined with matching or non matching rows of the other (inner). ASE tends to choose the outer table as the external, and then applies the algorithm described above. So, an efficient outer query should have:

Few rows selected by the "filtering" predicates against the outer query. The query which retrieves them, however, doesn't need to be very efficient.
An efficient access to the inner table: either it is very small or the "filtering" plus "join" predicates combined allow to use efficiently one of its indices.

I presume there aren't many differences with other RDBMS. Looks like there aren't many algorithms to choose.

Regards,
Mariano Corral

Sent from RemarQ http://www.remarq.com The Internet's Discussion Network * The fastest and easiest way to search and participate in Usenet - Free!

Received on Mon Nov 22 1999 - 15:05:31 CST