Re: What is Parallel DML?

From: Billy Verreynne <vslabs_at_onwe.co.za>
Date: Sat, 21 Jul 2001 22:17:15 +0200
Message-ID: <9jcnuh$o4b$1@ctb-nnrp1.saix.net>

"Dino Hsu" <dino1_at_ms1.hinet.net> wrote

> I don't fully understand the purpose and significance of using
> parallel DML yet, can anyone further explain this? What is difference
> between 'with' and 'without' parallel DML? Thanks in advance.

What is the slowest operation on a computer? Disk i/o. Never mind how big the pipe you have between the disk and the CPU, there is latency in getting the read/write heads into position and reading the data from the disk.

Let's say you want to query 50,000,000 rows with a SELECT statement. The query can not make good use of an index because you need to process most (or all) of the rows in the table. Nothing unusual in data warehousing.

You do a
SELECT col1, sum(), count() etc. FROM big_table GROUP BY col1

Oracle uses a single process to do it. It takes a VERY long time for it to go and read 50 million rows from disk - never mind doing the aggregation and grouping.

How do you speed this up? Using parallel processing. Let's say you have a big machine that can handle at least 50 i/o processes (and NO, having more than one CPU is NOT compulsory!!).

Okay, you now change the SQL to the following: SELECT
/*+ FULL(big_table) PARALLEL(big_table,50) */ col1, sum(), count() etc. FROM big_table GROUP BY col1

(Note that if you running on a cluster, you can specify PARALLEL(tablename, processes, clusters) to specify how many parallel processes to use on how many nodes in the cluster)

Oracle now uses 50 process for the query. The big table is broben up into 50 chunks - Oracle modifies your SQL query slightly and add a WHERE clause to it. WHERE rowid BETWEEN :a AND :b

Each of the 50 processes (parallel query slaves) now get a range of rows to process. In ideal circumstances, each will now process 1 million rows. We now have 50 processes each doing a million rows versus the previous scenario where ONE process had to scan 50 million rows.

That is PQ. Now Parallel DML works in a very similar fashion, but you are creating data and not just doing parallel query.

Using the above sample query again. Let's say we want to store the results of the query in a table:
CREATE TABLE results
UNRECOVERABLE AS
SELECT
/*+ FULL(big_table) PARALLEL(big_table,50) */ col1, sum(), count() etc. FROM big_table GROUP BY col1

A couple of things to note here. You will now have a 100 parallel processes and not 50. Each PQ reader also will create a writer process. One process reads the data. A companion process writes that data into the results table. Also be careful of extent sizes as each of the 50 create table processes (the writers) will grab an extent.

You can do similar when creating indexes and for other DML statements.

The most PQ's I have run (effectively) were 80 for a single SQL query. On that specific platform I managed to process 170 million rows in a table (with joins to other tables, doing aggregations) in less than 90 minutes. Show me SQL-Server doing that.. :-)

The 50 PQ example I use above is likely an overkill on many platforms. You need to monitor the operating system using its performance tools and push the number of PQ's up until you run at just over 90% CPU time. Remember that idle CPU time is wasted time. You can only determine the max number of PQ's you can effectively use on a platform by performance monitoring.

Correctly using parallel processing is IMO the single biggest performance booster that you can use. And not just in Oracle.

--
Billy

Received on Sat Jul 21 2001 - 15:17:15 CDT