Join Methods and Sub-Queries

Nested Loops join
Cluster join
Hash join
Sort-Merge join
Nested Sub-Queries
Merged Sub-Queries

For a full description of Oracle's join methods, see the Oracle online documentation Concepts manual.

The methods described below are those employed by the Oracle RDBMS to resolve SQLs with two or more tables specified in the FROM clause, or in some circumstances where a sub-query is specified. Oracle's Cost Based Optimiser will select a join method based on statistics and indexes available, but it may not always choose the best method. Use Explain Plan to determine which join method Oracle has chosen, and Optimizer Hints to force Oracle to use an alternative method.

Nested Loops

Nested Loops (NL) is the most common type of join. NL selects a row from one table, and then looks up matching rows in the second table using the join predicates specified in the WHERE clause. If three or more tables are specified in the FROM clause, NL - having selected a row from table 1 and matching rows from table 2 - can then pick up matching rows from the third and subsequent tables also using Nested Loops. As soon as all tables are joined using the first row from table 1, NL will then proceed to the second row and so on.

When to use Nested Loops joins

On-line applications where the first few rows of data need to be returned quickly for viewing. Providing a query has no DISTINCT, ORDER BY, GROUP BY, CONNECT BY, grouping functions, or analytic functions, Nested Loops can return rows as soon as they are joined. ie. Oracle does not need to join every row.
The second and subsequent tables in the join can all be joined on low cardinality (few rows per key) indexes.
The first (driving) table in the join has a selective WHERE condition that is indexed that is not a join condition. ie. WHERE t1.KEYCOL = 'VALUE'. Without an indexed selection on the driving table, Oracle will perform a Full Table Scan. NL may still be the best join method, but Sort-Merge and Hash should be considered.
All tables to be joined by Nested Loops are medium-large (>500 rows). Small tables are usually best joined with a Hash join.

Nested Loops is Oracle's fall-back position for joins. Other join types can be very efficient in special circumstances, but they all have special conditions that must be met. If these conditions are not met, Oracle can always use a Nested Loops join, even if it is chronically inefficient.

Cluster

Cluster joins are only available when you join two tables that reside in a multi-table cluster, and the join uses equals predicates on the cluster key.

If the cluster is set up well, then Cluster Joins are the absolute fastest join method available. They work a bit like Nested Loops: for each row selected from Table A, Oracle looks up rows in Table B. The difference with Cluster Joins is that the Table B rows live in the same block as the Table A rows for that key, so Oracle does not actually have to do any more IO - it got the Table B rows for free as soon as it selected from Table A.

Clusters are difficult to set up and maintain, so you would only set up a multi-table cluster when you have high volume joins, or high frequency low volume joins. If you have infrequent low volume joins then it is hardly worth clustering; an indexed NL join would work fine.

Hash

Hash joins were introduced in V7 of the RDBMS, and became stable and reliable in 7.3. Some found that the Cost Based Optimizer was so keen on hash joins that it used them even when it was inappropriate. This behaviour has improved markedly in V8, but some DBAs may still have Hash joins disabled. If this is the case on your database, you can still request a hash join using the USE_HASH hint, or allow hash joins for the entire session with ALTER SESSION SET HASH_JOIN_ENABLED=TRUE. The algorithm for performing a hash join is particularly clever, and worth reading in the Oracle online doco (Concepts manual). Like Sort-Merge joins, Hash joins are useful for large joins, but only where the join condition is an equi-join.

When to use Hash joins

At least one side of the join is returning many rows
Low cardinality indexes are not available on the join keys.
The join predicates use only equals (=) conditions.

Sort-Merge

Sort-merge is the traditional alternative to Nested Loops, but is now largerly defunct due to the superior performance of the Hash Join. The two tables are sorted by the columns in join predicates, and then processed in a single pass on each table. A happy side-effect of the sort-merge join is that the results are returned sorted by the columns in the join predicates; this feature can be used to dispence with an ORDER BY clause and therefore a subsequent sort.

When to use Sort-Merge joins

You are using a version of Oracle prior to 7.3 when Hash joins were either unstable or non-existent.
The conditions for using a Hash join are met, but you want the results sorted by the join key. In this case, try a hash join anyway because it might be faster.
You are performing a large range-join (ie. joining with greater-than, less-than, or BETWEEN), making a hash-join impossible. Whilst sort-merge will give the best performance for a high volumn Range Join, you should re-visit your physical design and see if there is another way to design your tables to use equi-joins.
You have both equals and range predicates in your join, but the equals predicate(s) alone do not have very good join cardinality. eg.

SELECT a.*
    FROM table_a a, table_b b
    WHERE a.id = b.id
    AND a.txn_date BETWEEN b.effective_date AND b.expiry_date

Say table_b had an average of 5 rows per ID, but only one row is valid for any given date (using effective_date and expiry_date). Here, a sort-merge join would probably out-perform a hash join because the hash join must filter the results by txn_date after the join is complete, whereas sort-merge can use the TXN_DATE predicate along with ID in the join.

This covers the main join methods. The remaining join methods are special forms of the above methods utilised in sub-queries. The subject of sub-queries covers the following constructs:

WHERE EXISTS ( sub_query )
WHERE NOT EXISTS ( sub_query )
WHERE col IN ( sub_query )
WHERE col NOT IN ( sub_query )
WHERE col oper ( sub_query ) {oper: =, <, >, <=, >=, !=}
WHERE col oper ALL ( sub_query )
WHERE col oper ANY ( sub_query )

There are two ways for Oracle to evaluate sub-queries:

Nested sub-queries
Merged sub-queries

Nested Sub-Queries

The default method for evaluating sub-queries is to use a Nested Sub-Query. Like a Nested Loops Join, Oracle retrieves all rows from the outer / surrounding query, and then for each one it will execute the entire sub-query. Nested sub-queries are good for low-volume SQLs: both the outer query and the sub-query should return very few rows.

You can tell if Oracle is performing a Nested Sub-Query by looking at your Explain Plan. Nested Sub-Queries show up as a FILTER step. The first child of the FILTER step will be the outer query, and the second step will be the sub-query. Note that FILTER steps will occasionally show up with just one child step - these are not nested sub-queries.

Merged Sub-Queries

The Cost Based Optimizer has a number of tricky methods of internally restructuring you sub-query and turning it into a table join instead. Once it is a join, Oracle is then able use Hash or Sort-Merge techniques to execute it.

Not every sub-query can be merged, and Oracle will not do it automatically. The fastest way to tune a high volume sub-query is to add the UNNEST hint. If your Explain Plan is showing a FILTER step, then Oracle is performing a Nested Sub-Query. If it is showing HASH JOIN or MERGE JOIN, optionally with the keyword ANTI or SEMI (see below), then it is performing a Merged Sub-Query.

If the UNNEST hint is unable to convert your sub-query, then learn more about Anti-Joins and Hash Joins below.

Anti-Join

Prior to Oracle V7.3, NOT IN sub-queries could be tragically slow. The reason was that Oracle could only resolve a NOT IN subquery using a FILTER - effectively a Nested Loops join. For each row in the outer query Oracle would run the entire sub-query. A large query selecting 100,000 rows from the outer query would run the sub-query 100,000 times. Clearly it would be more efficent in many cases to simply read all rows from the sub-query table once, compare them to the rows returned from the outer query and discard the rows that match. This is how an ANTI-JOIN works. Anti-Joins may use Hash or Sort-Merge joins.

An anti-join must be specifically requested using the UNNEST, HASH_AJ or MERGE_AJ hint.

When to use an Anti-Join

Anti-Joins are only available when performing a NOT IN sub-query
The sub-query must not be correlated. ie. The sub-query cannot contain any expressions in the SELECT or WHERE clauses that refer to columns in the outer query.
Use anti-joins when a lot of rows are being selected in the outer query.

There are other conditions that must be satisfied for Oracle to perform an anti-join. Check the online doco Concepts manual. Always use Explain Plan to ensure that an anti-join is being used; Oracle may ignore the HASH_AJ or MERGE_AJ hist if any condition is not satisfied.

Semi-Join

Introduced in Oracle 8.0, the semi-join provides an efficient method of performing a WHERE EXISTS sub-query using Sort-Merge and Hash joins. Previously Oracle would always use a nested sub-query. Semi-joins may be requested by specifying any of the hints UNNEST, HASH_SJ, or MERGE_SJ.