Re: Tuning large aggregation query

From: Rumpi Gravenstein <rgravens_at_gmail.com>
Date: Fri, 4 Apr 2014 09:17:14 -0400
Message-ID: <CAEpg1wAXeSkNdBb7Z0QGxh0R9DoO4gT_yoaODmD3=8d7NefNgw_at_mail.gmail.com>



Looks like a good candidate to run in parallel.

On Fri, Apr 4, 2014 at 9:12 AM, Thomas Kellerer <thomas.kellerer_at_mgm-tp.com>wrote:

> Hi,
>
> for reporting purposes we are maintaining some denormalized,
> pre-aggregated tables in our OLTP database.
>
> These queries are aggregating sales and other related data from several
> source tables into one aggregation table.
>
> Our problem is that the aggregation of some 70 million rows into approx. 4
> million rows is quite slow.
>
> As far as I can tell, most of the time is lost when doing the actual
> aggregation.
> The join itself is quite fast, but aggregating 70 million rows needs a
> huge amount of memory (or temp-space)
>
> I uploaded to the execution plan to pastebin to preserve formatting:
> http://pastebin.com/XfEvB3z9
> (You need to disable word-wrapping though)
>
> The basic structure of the statement looks like this:
>
> select base_table.pk_col_1,
> base_table.pk_col_2,
> base_table.pk_col_3,
> sum(....),
> sum(...),
> min(...),
> .... and another approx. 100 columns ...
> from base_table
> join sales_data on ...
> group by base_table.pk_col_1,
> base_table.pk_col_2,
> base_table.pk_col_3
>
> There is a 1:N relation between base_table and sales_data.
>
> Sales_data essentially stores sales information for products on a per-day,
> sale and store level.
> The goal of the aggregation is to sum up that daily information to a
> product/store level.
>
> sales_data contains about 70 million rows, base_table contains about 4
> million rows, so the result of the aggregation is again 4 million rows.
> sales_data is partitioned by the region in which the stores are located
> (and that partitioning cannot be changed).
>
> We applied the same partitioning scheme to base_table and target_table in
> the hope that this will make the parallel execution more efficient, but
> changing from unpartitioned to partitioned didn't improve the queries (but
> didn't make them worse either). We tried various indexes (in additioin to
> the PK index that is already used for the JOIN). But apparently for the
> type of query there isn't much an index can do.
>
> The main time is spent doing the aggregation.
> I confirmed that by only running the select statement that's inside the
> merge and that takes nearly as long as the merge itself.
>
> We were able to improve the whole process a bit by using parallel
> execution, but the runtimes are still way too high for the "window" we have
> in the night to prepare the data. The target would be that a single merge
> like that runs in about 10 minutes, currently it's around 60minutes.
>
> The test server runs Oracle 11.2.0.4 on a Windows 2008R2 server with 64GB
> RAM, a 8 spindle RAID 10 attached and 6 cores (12 CPUs).
>
> I can provide more Oracle configuration parameters if needed.
>
> The production server also runs Oracle 11.2.0.4 on Windows Server 2008R2,
> has 32 cores (64CPUs), 192GB RAM
> But we will not be able to test this on the real production hardware - and
> the number of rows in production will probably be 4-5 times bigger.
>
>
> I am looking for ideas on how to improve the runtime of those statements.
> Just some pointers on what things I could try out to either optimize the
> query itself or tune the Oracle configuration
>
> Thanks
> Thomas
>
>
>
> --
> http://www.freelists.org/webpage/oracle-l
>
>
>

-- 
Rumpi Gravenstein

--
http://www.freelists.org/webpage/oracle-l
Received on Fri Apr 04 2014 - 15:17:14 CEST

Original text of this message