FW: reduce table fragmentation by design

From: Mark W. Farnham <mwf_at_rsiz.com>
Date: Thu, 27 Jan 2011 12:04:32 -0500
Message-ID: <33257B2F35204D89AE18FBAA1AB2DD6D_at_rsiz.com>

<snipped some more>


IOT means index organized Table, not the only index you can put on a table.
<For those of you segregating tables from indexes for administrative
purposes, IOTs should be with your tables and clusters because they are source data and not generally recreatable from something else. This is NOT that debate.>  

Since all your extra columns are numbers (not longs or lobs or huge character fields) there seems little impediment to creating all those inline. However, that seems unlikely to me to be the best solution.  

If you had control of the application enough to have one synonym for insert and a different synonym for queries, I'd use the union-all table partition simulation game suggested by Tim and others.

If you don't and cannot easily get control of synonyms for insert versus query (by the way it is an interesting practice to maintain a distinct notions of <insert_name> and <readonly_name> and a few more as a development standard, but that topic is far too complex for an e-mail), there is still useful intervention you can do if you have a maintenance window.  

Now perhaps your application is performing the periodic delete. If that is the case but the deletion rule is as simple as keeping 45 days AND you have a maintenance window, then you simply employ one of Tim's favorite observations (and mine) that inserts are cheaper than deletions.  

So that goes like this:  

suspend application use

create <new_table> as

select from <current_table> where <row_insert_update_rule> <=

rename table <current_table> to <dated_research_table_to_export_and_drop>

rename table <new_table> to <current_table>

run the application purge [which will find no work to do] if it is a required step in your application processing

resume application use  

Now, as for physically ordering you inserts, it will be far more productive to pick the most common access path. Since you report that your only index is a surrogate key and a timestamp, it seems strange that you would be in a situation where you access thousands of them at a time through the key, but that is what you've reported. So ordering the create table as select (CTAS) query in that manner would minimize the clustering factor on that index making it relatively more attractive to the CBO and tending to minimize the number of datablocks required to scan to retrieve all the rows (assuming non indexed columns are required) in a range on that key.  

If you discover a different ordering of rows that is the predominant access pattern for research queries it is an open question whether the net load and responsiveness of the system would be served by adding an index and doing the CTAS in that order.  

We didn't discuss the fraction of rows retained by the keep 45 days rule, but if your purging is every 7 days or so you're keeping about 45/52 of the table. That might be somewhere around the ever changing case-by -case boundary of it actually being cheaper to CTAS than delete (about 13% of rows deleted is what you're doing), so it would require a test to see which was actually cheaper, but CTAS to delete solves your problem of having a honeycomb at the end of the purge.  

Now if you cannot suspend the application use and you really have to use the application's purge routine deleting row by row, there is perhaps some utility to ordering your inserts by timestamp if they are direct, since then they will fill blocks together and expire together tending to put whole free blocks on the freelist (or be nearly empty when they again appear on the ASSM bitmap as insert candidates). That won't clean up your existing honeycomb scatter for non-direct inserts, so a one time rebuild would still be required.  




Received on Thu Jan 27 2011 - 11:04:32 CST

Original text of this message