Advanced Structures

It is possible to build an entire application in Oracle using only tables. Granted - it would not be very efficient unless the tables were very small. For this reason, most applications will include one or more indexes on each table. Occasionally an application will make use of Views, but usually only to centralise the code for a common table join or aggregation.

Oracle provides a number of other structures that can be used make the database more efficient and improve performance.

Partitioned tables

A partitioned table is a regular table that is broken up into chunks (partitions) that can be individually managed. Rows are distributed amongst the partitions based on the value of one or more columns - the partition key. There are three types of partitions:

List Partitions
Each partition contains rows for one or more distinct values of the partition key. For example, a list partitioned table on the STATUS column might have a separate partition for each value of STATUS - 'OPEN', 'CLOSED', 'CANCELLED', 'PENDING'.
Range Partitions
Each partition contains a contiguous range of values of the partition key. For example, a range partitioned table on the TXN_DATE column might break the table into monthly partitions - 'Jan 2003', 'Fed 2003', etc...
Hash Partitions
Each partition contains rows whose partition key hashes to a given value for the partition. Hash partitioning can break a table up into equal size partitions, even when the partition key does not have equal distribution. Oracle generates its own hash algorithm based on the number of partitions desired. For example, for a hash partitioned table with 20 partitions, Oracle will generate a hash algorithm that returns a number from 1 to 20.

When a SQL contains a WHERE clause on the partition key, eg. WHERE txn_date BETWEEN :b1 and :b1+10, that is not very selective (ie. returns lots of rows), then an index scan could be very inefficient. However a Full Table Scan on a very large table may be even more inefficient. By partitioning the table, Oracle may only need to Full Scan on one or two partitions depending on the WHERE caluse.

Although partitioning can be a developers tool for improving performance, it is even more practical for DBAs who need to reorganise and archive large tables - partitioning allows them to take a parallel or piecemeal approach.

Partitioning should really only be used on large tables (>50,000 rows), and only with the cooperation of the DBA.

Bitmap Indexes

Normal Oracle indexes are known as b-tree indexes. b-tree indexes are virtually useless on low-cardinality columns (columns with only a few distinct values). By the time 4% or more of a table is read via an index, it would be faster to perform a full table scan. Bitmap Indexes do not solve this problem in general, but they do solve a specific case. If a table contains 2 or more low-cardinality columns and they are used in combination in a SQL statement to reduce the number of rows returned, then bitmap indexes can perform faster than a full table scan.

Use bitmap indexes when:

Bitmap Join Indexes

This section not yet developed.

Clusters

The smallest unit of data that Oracle can read from disk is the block. Depending on the database, a block is usually 8K or 16K, but can be smaller (especially pre-v8 databases) or larger still. Most tables can store many rows in a single block; obviously this will depend on the number, size and content of columns, but 3-10 rows per kilobyte is not uncommon - ie. 24-80 rows per 8K block. So, whenever a single row is read, a futher 40 or more rows will be read at the same time; these rows are frequently discarded. Clusters help us exploit this feature.

In some tables, we will often read collections of rows together. If the rows are stored in a cluster, then we can arrange it so that all rows with the same value of a particular column or columns (the Cluster Key) are stored together in the same block. For example, if 20 rows with the same Cluster Key are accessed from a cluster, they will be spread across only one or two blocks. Without a cluster - worst case scenario - they will be spread across 20 blocks. This worst case is often the likely case unless the rows were all inserted at the same time.

Important: Clusters require skill to set up correctly. They are more difficult to manage, and take up more space than a regular table. Always get approval and assistance from the DBA and/or Database Architect before creating a Cluster.

There are two different types of clusters in Oracle

Index Clusters

In an Index Cluster, rows with the same cluster key are stored together in the same block (where possible), and an accompanying index stores pointers to the cluster blocks for each cluster key. Access speed on an index cluster is the same as that of a table for single row access, but is much faster when multiple rows are processed with the same cluster key.

When a row is inserted into the table, Oracle will check whether any existing rows have the same cluster key. If so, the new row will be inserted into the same block if there is room. Otherwise a new block will be allocated. Oracle will not use any old block like it would for a non-clustered (heap organised) table.

Hash Clusters

In a Hash Cluster, rows with the same cluster key are stored together in the same block (where possible), but an index is not necessary. To access a Hash Cluster, Oracle hashes the cluster key, and uses this hash key to identify a block. If set up correctly, this can be much faster than using an index. Hash clusters are preferred to indexed clusters for this reason for queries with equals (=) or IN list predicates. A hash cluster cannot process range predicates (<, >, LIKE, BETWEEN), so if these are common (or even likely) then an index cluster will be better.

Unlike an Index Cluster, Oracle must reserve space for a Hash Cluster up front. If a hash cluster is declared too large, you can get problems with Full Table Scans. If it is too small, then not all rows with cluster keys that hash to the same value will fit in a single block, so rows will overflow into chained blocks. Once a hash key is chained, any access on that hash key must read 2 (or more) blocks instead of just one. A chained hash table should be re-organised by the DBA until the USER_CLUSTERS.AVE_BLOCKS_PER_KEY statistic on the cluster is 1.

When set up correctly, hash clusters can provide super fast access to individual rows where the cluster key is referenced with equals predicates, because it can go straight to the block containing the matching rows without using an index. The trade off is space, maintenance, and slower full table scans.

A cluster can store the rows for a single table, or it can be configured to cluster rows with the same key for two or more tables.

Single table clusters

Single table clusters have two excellent uses:

Multi table clusters

Multi-table clusters are used when two or more tables share the same cluster key, and the tables are frequently joined on the cluster key. A multi-table cluster allows Oracle to perform a Cluster Join; the most efficient join method available. For low volume joins (ie. pick up a few rows from Table A and look up matching rows in Table B), a cluster join is hardly warranted as an indexed Nested Loops join will perform perfectly well. Consider a mutli-table cluster when high volume (many rows from both tables) joins are more common.

For Cluster Joins, it doesn't matter whether you use an Indexed Cluster or a Hash Cluster. That is only important for initially selecting the rows from the first table, or for single table selects. This is discussed in Single Table Clusters above.

Use multi-table clusters when:

Index Organised Tables

Regular tables in Oracle are otherwise known as Heap Organised. ie. They are not really organised at all; any row can reside in pretty much any block.

Sometimes a table will have no columns other than its primary key, and sometimes just a couple of columns other than the primary key. For tables such as these it seems a waste to create both the table and the primary key index as the same information is stored twice. An Index Organised Table allows you to avoid this situation. With an IOT, the index is the table.

An old technique used to avoid table accesses was to add non-key columns to an index. In this way Oracle could get all the data it requires from the index without looking up the table. IOTs allow us to do this even better because we don't compromise the uniqueness of our index. Of the non-key columns in an IOT, you can choose some or all of them to be stored in the same blocks as the index without affecting the uniqueness of the primary key. Other non-key columns are stored in an overflow segment. Queries that access the columns in the overflow segment may be slower because they have to read both the index blocks and the overflow blocks.

Use an IOT when:

Avoid IOTs with both Overflow Segments and non primary-key indexes. A non-pk index access could possibly read blocks from the non-pk index, the pk index, and the overflow segment. This would be very inefficient.

Externally Organised Tables

Introduced in Oracle v9, Externally organised tables allow SQL to process an external data file as if it were a table. The syntax for EOTs is based on the syntax for SQL*Loader. From v10g onwards, External Tables can use oracle Data Pump instead of SQL*Loader syntax.

The big advantage here is for Data Warehouses that typically load a file into a staging area, and then transform it using table joins and other SQL functions that could not be performed efficiently during the load. EOTs allow you to do both steps at once, saving on the cost of writing and reading the staging table.

Obviously EOTs cannot be indexed or partitioned, so should only be used where a Full Table Scan is desired.

Global Temporary Tables

Many complex processes - particularly those in data warehouses - use temporary tables. Prior to Global Temporary Tables, the application may have code embedded to CREATE TABLE temp_... AS SELECT ... or perhaps a table already existed into which the process inserted rows and deleted them afterwards.

Both of these processes suffer when multiple concurrent sessions require the same temporary table.

A Global Temporary Table solves this problem. A Global Temporary Table is like a regular table, except for the following:

Materialized Views

It is well known that Oracle views do not consume space; they simply offer an alternative way to access the underlying base tables. Materialized Views are views where the view query has been resolved into a table. ie. They do take up space. They can be configured to refresh upon commit of changes to the base tables, on demand, or periodically.

Depending on the nature of the query in the view, it is also possible for refreshes to act only on those rows that have been inserted, updated, or deleted since the last refresh.

One of the biggest features of Materialized Views is that when you submit a query against the base tables, and the query could be more efficiently resolved by the Materialized View, then Oracle can be made to automatically rewrite the query to use the MV.

This feature makes MVs valuable for aggregates. For instance, we can create daily, weekly, and monthly aggregate materialized views over a base table. When users perform aggregate queries on the base table, Oracle can go straight to the most efficient aggregate MV to resolve the query.

Partition Views

Partition Views enjoyed an all-too-brief popularity with Oracle v7 and were then largely ignored as redundant with the release of Partitioned Tables in Oracle v8. A Partition View is a view of the form:

If all of the tables in the partition view have the same column names, data types, indexes, and the component queries of each UNION ALL reference only a single table using SELECT * without sub-queries or CONNECT-BY (phew!) then Oracle can treat SQLs on the view much as it would treat SQLs on the individual tables. For instance, WHERE clauses against the view can be reliably pushed into the WHERE clause of each UNION ALL component and Oracle will choose the best execution path for each component.

Clearly, if component tables of the view have the same columns, data types, and indexes, then it would be preferable in a v8+ database to use a Partitioned Table. This is certainly Oracle's position, because although (as of 10g release 2) they are still supported by the optimizer, they are now no longer described at all in the documentation. To read more about Partitioned Views, you will need a copy of the v7.3 online Oracle documentation - miraculously still available on www.oracle.com!

The other reason that partition views are now largely irrelevant is that the optimizer is becoming more and more adept at merging predicates on queries into the text of the view itself; having the same columns and indexes on each table in a UNION ALL view no longer confers an advantage - except for one case: STAR_TRANSFORMATION. Oracle cannot perform a STAR_TRANSFORMATION on a UNION ALL view that does not meet the criteria of a Partition View.

So, having established that there is at least one reason to meet the stringent Partition View requirements over and above any other UNION ALL view, Partitioned Tables are still easier; why would you not use a Partitioned Table?

Pipeline / Table Functions

As a general rule, querying data from the database can be done more efficiently with pure SQL rather than PL/SQL. Of course, every rule has its exceptions; complex data retieval involving several steps, conditional logic, and/or exception handling, is often not only easier to write, read, and maintain in PL/SQL, but it can also run faster.

The problem with this approach is that only a small part of an overall requirement may be complex enough to warrant PL/SQL; is it possible to just write the complex part in PL/SQL, and then use the result set in pure SQL statements? As of v9i, the answer is Yes. Table Functions, and especially Pipelined Table Functions can be used to convert a Nested Table (populated by a custom PL/SQL function) into an in-memory table that can then be referenced in a SQL statement. When the SQL statement is executed, the PL/SQL function is called, rows are returned to the SQL engine, cast into a table, and then processed by the SQL statement.

In this way, it is possible to:

A normal table function that returns a Nested Table will run to completion, and return all rows in a single Nested Table. This technique is not really scalable, because of the memory required to store the Nested Table as it is being built; performance will suffer as the nested table starts paging to disk. By making the PL/SQL function Pipelined, rows are returned directly to the calling SQL statement rather than being stored in the Nested Table.

Parameterised Views

A Parameterized View is not an Oracle object, but a technique that is sometimes used to counteract problems with selecting from views.

This technique is a bit contentious, as it is usually possible to resolve the problem in a different way. For the sake of an example, consider the following:

Parameterised views also allow us to push predicates into views where it would otherwise be impossible; such as when the column in question is in a sub-query within the view, or when the view uses complex functionality such as CONNECT BY, ROWNUM, etc.


©Copyright 2003