Designing a process flow, was RE: Limit on number of columns in an index

From: Mark W. Farnham <mwf_at_rsiz.com>
Date: Thu, 20 Mar 2008 10:07:53 -0400
Message-ID: <015a01c88a93$cb6b6da0$1100a8c0@rsiz.com>


Someone else has described the current technical limits of Oracle on this. (about 32 or 30 depending, and a bit more than 2 times the longest possible concatenated key plus rowid needs to fit in the block size remaining after overhead. The 32/30 limit is arbitrary, while the length thing is a logical requirement unless Oracle decides to break through the huge technical barrier [that is unlikely of real benefit] of splicing blocks together. [Far simpler to just use a larger blocksize tablespace for that index if you ever really needed it, and Oracle could de-deprecate multiple block sizes by removing the artificial imposed penalty of refusing to implement the option to automanage the buffer cache size(s) if you have multiple block sizes.])  

But you have asked "before it is considered a bad design."  

I consider a design to be bad if it is ill suited to the problem at hand or error prone.  

In your case you are populating a temporary table and then updating the table based on various criteria.  

The first design question is whether having a single row in hand at the beginning of this process, could you make all the updates? If that is the case, you almost certainly should grab a buffer of rows of the convenient size (small enough to avoid memory problems and accumulated required undo, large enough to avoid generating significant commit overhead. Without knowing anything about your data I'd hazard only a very wide guess such as greater than 500 and less than 100,000 rows per chunk processed circa 2008 AD). Then you pass over the inbound data once without a monolith problem and you can carve the inbound data into as many parallel threads as is appropriate to the available hardware. A crude means of buffering could be to write many temporary tables in parallel and simply FTS each of them in parallel to perform the desired manipulations.  

If, on the other hand, there is a set of ordered updates made on subsets of the rows such that each ultimate tuple cannot be determined until other phased updates are made to other rows, you may have an interesting problem. For example, let's say you determine some column category value from other column values and you treat categories differently based on rank. Then you would need all the rows in hand at once for the rank analysis.  

At that point you need to play manual cost based optimizer based on your long term knowledge and/or likely assumptions about the inbound data combined with the transforms you need to apply to the data.  

If a column is used as a filter/value factor only once, it makes no sense to index it. (Think that through. If someone thinks I'm wrong, please educate me.)

On the other hand a single column reference *might* provide an effective intermediate partitioning key, if quite different process groups succeed such segregation, even if you have to Union-ize the result sets for subsequent processing.  

Barring some correction, at some value greater than one filter/value affecting reference, it becomes worthwhile to index a column. Whether there are pairwise or setwise concatenated indexes that provide net utility is another interesting question. During your temporary processing, it might be the case that some column you indexed for efficiency in an early step would later be updated after it is no longer needed for any more decisions. In that case is is likely productive to drop the index before that column starts getting updated.  

Pruning the sets on which you must operate makes sense when it is a good bet that the pruning costs less than performing a null operation on the rows that would have been pruned.  

Usually if you lay out the ordered transform steps (and consider whether any of the steps are arbitrarily ordered and therefore have flexibility with respect to the transform), you can easily discern a pretty good transform centered design and it will become obvious when in the overall transform you need particular indexes.  

Good luck!  

mwf    


From: oracle-l-bounce_at_freelists.org [mailto:oracle-l-bounce_at_freelists.org] On Behalf Of Ram Raman
Sent: Wednesday, March 19, 2008 4:06 PM
To: Bobak, Mark
Cc: oracle-l
Subject: Re: Limit on number of columns in an index  

The table has 60 columns. The problem is that this is a temporary table which gets used only when the process is run and gets truncated afterwards. While running, currenlty it gets populated with 1.4 million rows. The rows are inserted and then updated based on several criteria. Towards the end of the process, the values from this table are inserted to another permanent table. The process is taking more than 2x longer now (2+ hrs now) than before because of increased volumes of data.    

The insertion was slow, I tuned it. Many update statements are issued against the temp table. I am trying to make things faster. I tried creating few indexes ont he temp table(!). Hope they dont slow down the insertions. I did not try parallelizing the queries.  

Env: Pplsoft, 10g  

<snip>

  • original question:

Is there any limit on the number of columns a BTree index can have, before it is considered a bad design. I see a need to build an index with almost 30 columns.

<snip>
 

--
http://www.freelists.org/webpage/oracle-l
Received on Thu Mar 20 2008 - 09:07:53 CDT

Original text of this message