Oracle FAQ Your Portal to the Oracle Knowledge Grid

Home -> Community -> Usenet -> c.d.o.server -> Re: Suggestions on how to efficiently search a large table

Re: Suggestions on how to efficiently search a large table

From: bdbafh <>
Date: Thu, 15 Nov 2007 09:38:15 -0800 (PST)
Message-ID: <>

On Nov 15, 9:35 am, wrote:
> On Nov 15, 7:42 am, bdbafh <> wrote:
> > On Nov 14, 8:55 pm, wrote:
> > > I have a billion+ row table that I need to do queries on.
> > > The table looks something like this:
> > > item
> > > id number
> > > process1_id number
> > > process2_id number
> > > process3_id char(35)
> > > I need to perform multiple processes on each item. When I process an
> > > item, I then set the corresponding column with the result of the
> > > process.
> > > I need to do queries like the following to determine which processes I
> > > haven't completed yet:
> > > select * from item where process1_id is null or process2_id is null
> > > or process3_id is null
> > > Any recommendations on how to index or query such a table to get good
> > > performance? The cardinality of all the process*_id columns is very
> > > high.
> > > Thanks.
> > You've provided no information regarding:
> > - version of Oracle software
> 10g
> > - edition of Oracle software
> > - how many client processes will be hammering the table
> 1
> > - how the table is being loaded
> loaded from a comma delimited file exported from a SQL Server
> database> - hardware, OS, clustering, RAC, server color
> LS41, SE Linux, OCSF, RAC> - desired response time
> < 2 seconds
> Also, I should have added that I will be limiting the query with a
> rownum < 1000 in the WHERE clause.

The subject of this post is overly constrained. Your issue is not just in searching the table for rows of data to be processed.
It also includes processing rows and updating rows that have been processed.
You mention that only one instance of a program will be processing rows that need to be processed ... but how much do you want to bet that somewhere down the road, additional resources will be devoted to processing the rows of interest in order to reduce the overall processing time?
A design that works just fine for a single process will likely hit concurrency issues once it begins to be multiuser. Partitioning the data is probably the way to go.

Back to your question -"how to efficiently search a large table?"

To borrow from others:

The fastest way to search a table is to not search it.

In other words you want to perform the search against a smallish index to find the candidate rows.
Your problem definition states that the data is loaded with columns of interest as null.
Oracle b*tree indexes do not index null values. Oracle has support for function-based indexes that could be used to provide an access path to rows of interest. Rather than repeat content that is posted elsewhere, search for articles on the use of function based indexes in Oracle 10g. you'll need to use the same expression in the function-based index as in the select statement.

Such indexes will add overhead to the loading operation. Depending upon the load type, rate and frequency, you might exhibit large amounts of undo being read in order to provide a consistent read of the index(es).
Is the same session going to be performing both the inserts and the selects/updates?
If so then concurrency won't be an issue. If not, you might want to time things such that loads and processing aren't running at the same time in order to minimize response time.

Follow-up questions:

is the Oracle Partitioning option available to you? (if so, use it) how many rows per day are inserted?
what size batch are they inserted as?
how frequently are the batches inserted? will Oracle sql loader be used to perform the inserts in bulk? is bulk (array) processing being used if sql loader is not being used? how many blocks can be read by the existing database server per second?

This won't be just a matter of keeping a static index in the keep pool for fast access, as the (function-based) index will be undergoing maintenance with each row inserted, updated or deleted ... which will generated undo and redo ... and changed blocks which will be written out at checkpoints (and at other times, also).

The part that will be funny is that after you have all of this put together and meeting your service level agreement for response time ... reporting against the data will change and blow your resource usage sky high.

good luck. have fun.

-bdbafh Received on Thu Nov 15 2007 - 11:38:15 CST

Original text of this message