Re: Tuning large aggregation query

From: Thomas Kellerer <thomas.kellerer_at_mgm-tp.com>
Date: Mon, 07 Apr 2014 10:24:00 +0200
Message-ID: <534260A0.2060108_at_mgm-tp.com>



Jonathan,

> How many regions, and how even is the distribution of sales data by region.

There are about 200 regions. The distribution is somewhat skewed. About 20 regions make up 40% of the data. The rest is more or less evenly distributed

> Is the region part of the primary key of base_table

Yes and no. We don't have a primary key constraint defined, only a unique key with the region id as the leading column and the real PK as the other columns. We found that this kind of index is in general better utilized by Oracle than a "real" PK.

Playing around with indexing and partitioning it seems that if I do _not_ partition BASE_TABLE things seem to be faster. But I have yet to confirm that.

> and is there a foreign key defined between the base_table and the sales data

No, there is no FK between those tables. Each column of the PK references a different table (products, stores and the marketing campaign to which that product belongs)

> does it make sense to think in terms of a (partial) partition-wise join on region, or does data-skew or
> the effect of a primary key value relating to sales data that crosses regions make this inappropriate ?

The sales data will never cross a region. Because essentially it is reported on store level and a store will always be in exactly one region - think of it as a kind of hierarchy: country -> region -> store

> Are all of the "other 100 columns" aggregates, or are some of them non-aggregated columns that also appear in the group by clause ?

All of them are aggregates. Most of them aren't "straight" sums but the sum is done conditionally based on the sales date and other parameters that appear in base_table and the target table of the merge, using sum(case when ... end)

> Can you show us an execution plan that includes the distribution columns for the parallel query;
> and a report of v$pq_tqstat after running the query.

I have uploaded the output of dbms_sqltune.report_sql_monitor and the v$pq_tqstat to pastebin:

The plan from report_sql_monitor: http://pastebin.com/2UXhusyk The contents of pg_tqstat: http://pastebin.com/Uwiu9QCC

Regards
Thomas

Jonathan Lewis, 04.04.2014 15:42:
>
>
> How many regions, and how even is the distribution of sales data by region.
> Is the region part of the primary key of base_table, and is there a foreign key defined between the base_table and the sales data;
> does it make sense to think in terms of a (partial) partition-wise join on region, or does data-skew or the effect of a primary key value
> relating to sales data that crosses regions make this inappropriate ?
> Are all of the "other 100 columns" aggregates, or are some of them non-aggregated columns that also appear in the group by clause ?
> Can you show us an execution plan that includes the distribution columns for the parallel query; and a report of v$pq_tqstat after running the query.
>

--
http://www.freelists.org/webpage/oracle-l
Received on Mon Apr 07 2014 - 10:24:00 CEST

Original text of this message