RE: Estimations are as good as it can be, what else can be done here?

From: Mark W. Farnham <mwf_at_rsiz.com>
Date: Wed, 22 Nov 2023 12:50:46 -0500
Message-ID: <0f0b01da1d6c$6cbefba0$463cf2e0$_at_rsiz.com>



Mark,  

I understood as Jonathan pointed the best strategy might be to flattened the required data in one table and then fetch the pk or rowid with minimal sorting overhead(with an index fully covering all the columns and same sorting order) and then traverse the table data from those selected rowids.  

The below strategy which you pointed, got bounced over my head. Do you mean adding some additional predicate to the query to minimize the amount of sorting? What are 'x' and 'y' here? Can you please explain a bit more.    

If a small set of columns (like sometimes one column) can distinctly provide the top n ordering, it may be useful to provide that additional predicate, as an and x => y gathering up the data to present.    

For slowly changing data and some “n” that it may even be useful to update a table containing the top “m” rows where “m” is as least as big as the biggest “n” your users will use and update that table if a new combination should be in it.

Two situations:

case 1: IF your users are running top n reports repeatedly with selections on less slowly changing data.

in addition to what JL and others already suggested, it MAY be cheaper to yank out just the ordering columns, and then yank out the lowest value for each column from those n rows and filter on greater or equal to the lowest value seen in the top n rows for each column at its source. Notice you can’t just take the lowest row if you’re doing all the columns as filters because (two columns) Z A first row would get filtered out by the second column if the last row of the top n was Y Y. You CAN take just greater than or equal to the first column alone in the order by. (But I will repeat this is not valid for hierarchical queries).  

IF the top N is the result of the value of a group by aggregation this also won’t help, because you can’t toss individual rows away for a value less than the aggregate they become a part of.  

case 2: IF your users are running top n reports repeatedly with selections on slowly changing data where n may vary but never be over m.  

with the same caveats as case 1, it may actually be useful in some data cases to keep a single row table with the lowest value that should be included and update that table if a transaction changes the low values to be included.  

Lots of top-n reports are either hierarchical or aggregations and are unsuited for these approaches. But quite a few top-n determinations are also quite simple, where as few as one ordering column on something could be the gating predicate and huge relational assemblies are tacked together only to be pruned down at the end. IF that gating value (or set of values) can be easily looked up or kept updated, then expensive reports can become quick.  

IF you are operating on a frozen standby clone (such as when a daily, weekly, monthly, quarterly, or annual trigger event is used to keep analysis on a static as was basis, aggregations and filtering aggregations are much more frequently useful to instantiate and use. Denormalizations are quite often perishable, but they are not when you are operating on a frozen set of underlying data.  

One of my most favorite customers froze a copy of a standby clone weekly after the process “generate receivables” was complete Friday night. Many aggregations were then performed on that frozen set of detail data the rest of the weekend and used for analysis the following week.  

Good luck,  

mwf  

<snip>    

--
http://www.freelists.org/webpage/oracle-l
Received on Wed Nov 22 2023 - 18:50:46 CET

Original text of this message