Oracle FAQ Your Portal to the Oracle Knowledge Grid
HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US
 

Home -> Community -> Usenet -> c.d.o.server -> Re: Optimal GROUP BY

Re: Optimal GROUP BY

From: Billy Verreynne <vslabs_at_onwe.co.za>
Date: 1998/03/22
Message-ID: <6f4v3h$28d$1@hermes.is.co.za>#1/1

Wilhelm Thieme wrote in message <6eud1m$c92$1_at_news.worldonline.nl>...
>I want to set up my database for the best performance possible for group by
>operations.
>The tables to be aggregated are huge (> 100 000 000 rows).
>We are using Oracle Parallel server 7.3.3. on a 14 node MPP.
>Any suggestions about database parameters, (TEMP) tablespace configuration
>etc welcome

You'll need -lots- of temp space! But IMHO when you're required to group that many rows then something is not kosher with the data warehouse design. I have seen a single SQL query doing a GROUP BY on 120+ million rows that ran out of space on a 15GB temp tablespace. The answer is to divide and aggregate.

When dealing with large volumes of data, I think that the performance enhancements should be concentrated on overall thruput, i.e. how many "bytes per second" can be processed by Oracle. You'll need to look at the best setting for instances and degrees on the tables, make sure that for small volumes of data indexes are used, and for larger volumes of data full tablescans are used. You also need to make sure that the tables are analysed, or else you'll get a situation where one parallel query runs 50 PQ slaves, each with a rownum range, but the ranges are stuffed and only the 1st 2 PQ slaves actually process any data. Other things to look at is the join methods - nested loop joins can kill the machine when joining a large table with a smaller table (hash joins work much better). When Oracle has to use a nested loop join, you need to make sure that the joining table is correctly index or else it will full tablescan the 2nd table for every single row read from the 1st table. Etc. Etc.

Tuning a VLDB OPS database is not trivial and includes all aspects of the MPP platform (hardware and operating systems) and Oracle. IMO concentrating on optimising a single aspect (such as GROUP BY) will not really address the performance issues of the database.

regards,
Billy Received on Sun Mar 22 1998 - 00:00:00 CST

Original text of this message

HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US