Oracle-L: RE: RE: Advice needed please

From: Stephane Faroult <sfaroult_at_oriolecorp.com>
Date: Fri, 31 May 2002 03:08:21 -0800
Message-ID: <F001.004709DD.20020531030821@fatcity.com>

('binary' encoding is not supported, stored as-is)
Lee,

I am always reluctant to post something which may look even remotely commercial but Oriole markets this kind of tool. It's, we believe, reasonably priced and you can try it for free, so perhaps it's worth for you to have a look before going into a full-blown development.

As far as I have understood what your developer tries to do, he tries to multithread the SELECTs - running several SELECTs at once. I think that it is a bad idea, because you are trying to reinvent the wheel (a wheel also known as Parallel Query). As some have pointed out, the main bottleneck here is likely to be data writing - but also, I should add, waiting for Oracle to return the data and formatting. Our tool, pdqout, is also multithreaded - but instead of having multiple threads querying the database, one thread queries, one thread formats and one thread writes to disk. As a result, CPU utilization is pretty high ...

HTH Stephane Faroult

>how about index organizing the table ? or ..
>creating an index on all the
>columns of the table..?
>this way the select will read only the index
>blocks..!!
>
>> ----------
>> From: Robertson Lee -
>lerobe[SMTP:lerobe_at_acxiom.co.uk]
>> Reply To: ORACLE-L_at_fatcity.com
>> Sent: Thursday, May 30, 2002 9:43 PM
>> To: Multiple recipients of list ORACLE-L
>> Subject: RE: Advice needed please
>>
>> Thanks Dennis.
>>
>> Anything whether it be a "reaction or an answer",
>is welcome.
>>
>> Regards
>>
>> Lee
>>
>>
>> -----Original Message-----
>> Sent: 30 May 2002 15:08
>> To: Multiple recipients of list ORACLE-L
>>
>>
>> Lee - Just some reactions, few answers.
>> - Generally a process like this will be
>disk-bound, not CPU-bound, so
>> idle
>> CPU time is to be expected unless your disk is
>REALLY fast.
>> - Multiple simultaneous full-table scans may
>not be any faster because
>> the
>> disk heads may need to flit to and fro in order
>to satisfy each process'
>> request. Sometimes a single full table scan is as
>fast is it gets for a
>> mechanical device like a disk. RAID will be
>faster, of course, but
>> ultimately the RAID is composed of disks.
>> - Trying for something faster than select * is
>a real challenge. To
>> perform a full table scan, Oracle must read each
>data block. The
>> alternative
>> is index scanning, but this means reading an
>index block, fetching a data
>> block, etc. Not faster if you're going to
>eventually read all data blocks
>> anyway.
>> - If select * isn't fast enough, you should
>consider using table
>> partitioning. That way each process can
>separately scan a separate
>> partition
>> and separately write to your output files.
>> Hopefully someone else will think of a bright
>idea I've missed.
>> Dennis Williams
>> DBA
>> Lifetouch, Inc.
>> dwilliams_at_lifetouch.com
>>
>>
>> -----Original Message-----
>> Sent: Thursday, May 30, 2002 6:48 AM
>> To: Multiple recipients of list ORACLE-L
>>
>>
>> Oracle 8.0.5
>>
>> Tru64 4.0f
>>
>> One of our developers here is writing a utility
>to provide fast unloads of
>> tables (to replace fastunloader as it happens)
>>
>> His problem is as follows. Start from the bottom
>and work your way up. I
>> would be really grateful if anyone can offer up
>some alternatives for us.
>>
>> Regards
>>
>> Lee
>>
>>
>> > -----Original Message-----
>> > From: Dudley Dave - ddudle
>> > Sent: 29 May 2002 16:04
>> > To: Robertson Lee - lerobe
>> > Subject: RE: Do you still have that
>SQL Expert?
>> >
>> >
>> > No, you miss the point. I'm explicitly NOT
>using PQ (or at least not
>> > explicitly using it).
>> >
>> > Using a parallel hint on huge table unloads
>- with the
>> > single-threaded version of the code (i.e.
>pipdynsql.v2.0.0) didn't seem
>> to
>> > make much difference at all. I didn't do the
>tests directly though,
>> poeple
>> > on the account did. So it may be that the
>tables already had a degree of
>> > parallelism built in, in which case I'd guess
>the hint would be
>> redundant.
>> >
>> > What I mean is that even if you use PQ for
>the server to extract the
>> > data in parallel you still have the bottle neck
>of a single client to
>> send
>> > it all back to. That's what I was trying to get
>around. Assuming that
>> > we're not generally using the full network
>bandwidth, I'd assume that
>> > multiple clients ought to be able to dump out
>separate sections of a
>> table
>> > at the same time, at roughly the same speed at
>a single client would
>> > unload a single table - i.e double the
>throughput.
>> >
>> > But I can't find anything on the web to
>tell me the best / most
>> > efficient way to actually do this. (By the way,
>I've tried the
>> NO_PARALLEL
>> > hint too, to stop the server setting off too
>many conflicting slaves on
>> > its side. Again no better as far as I could
>tell.)
>> >
>> > N.B. Not sure if you'd suggest it, but
>before you do: most of the
>> > tables we'd really want to use this for are
>massive, and so are already
>> > partitioned. So where I say "table" I mean
>either that or a partition
>> > thereof. Besides, need a generic solution that
>doesn't rely on having to
>> > partition your table to unload it quickly.
>> >
>> > By the way, I'm specifically testing speed
>of my original code (e.g.
>> > pipdynsql.v2.0.0 user/pass "select * from
>table") against the new
>> > multithreaded development code - i.e.
>regardless of the machine load at
>> > the time, I want to see if multiple
>simultaneous unloads can be quicker
>> > than a single unload client (at the expense of
>using more machine /
>> Oracle
>> > resource obviously).
>> >
>> > Is this making sense?
>> >
>> > Dave
>> >
>> > -----Original Message-----
>> > From: Robertson Lee - lerobe
>> > Sent: 29 May 2002 15:14
>> > To: Dudley Dave - ddudle
>> > Subject: RE: Do you still have that
>SQL Expert?
>> >
>> >
>> > How are you using PQ, is it just a hint ??.
>Which tables are you
>> > testing against.
>> >
>> >
>> > -----Original Message-----
>> > From: Dudley Dave - ddudle
>> > Sent: 29 May 2002 14:27
>> > To: Robertson Lee - lerobe
>> > Subject: RE: Do you still
>have that SQL Expert?
>> >
>> > OK Clever-Trousers,
>> >
>> > As you're so hot on table/index
>disk striping...
>> >
>> > I've written the program
>"pipdynsql", which as you may have
>> > heard (this lunchtime if not before), people
>want to use to replace
>> > FastUnloader.
>> >
>> > I've been playing about with a new
>multithreaded version to
>> > try to download a table in sections to multiple
>client threads which
>> then
>> > write back out to a single file (either
>ordered, or for max speed in
>> > random/undefined order).
>> >
>> > Can you think of any quick ways to
>do this, or "tricks" to
>> > try?
>> >
>> > I've tried ranges of rowids (as I'm
>told that's hold
>> > parrallel query works) but the ROWID (tab) hint
>does not seem to go
>> > through the table in rowid order. And it's a
>massive overhead to order
>> by
>> > rowid to work out non-overlapping ranges. And
>even if you do, you have
>> to
>> > say WHERE rowid >= xxx AND rowid <= xxx (as I
>say, can't force it to go
>> in
>> > ROWID "sort" order) so this tends to be slower
>than nect opt...
>> >
>> > tried assuming there's a unique
>index and giving start
>> > points to each slave thread, which then selects
>a set number of rows.
>> This
>> > is prety quick, but even this seems slower than
>a simple SELECT * FROM
>> > table (for the same number of records).
>> >
>> > tried loading temp "rowid" tables
>with sets of rowids and
>> > each slave does a full table scan of its rowid
>set table, with where
>> > clause connection to the data table.
>> >
>> >
>> > Can't find anything better than my
>original method which
>> > selects a unique key from an index with a
>"master" thread, for every
>> x'th
>> > rownum. Then hands these out to the slave
>threads to select * from table
>> > where unique key >= given key for specified
>number of records. e.g.
>> master
>> > pulls out every 100,000th key with a modulus
>and each slave dumps out
>> > 100,000 rows at a time, starting at the key its
>given.
>> >
>> > The above uses a temp view for the
>rownum bit to work. Also
>> > tried without a master slave where the unique
>key is numeric, as you can
>> > do the modulud directly on this. Tends to be
>slower - I guess due to MOD
>> > func overhead? Tried adding CACHE(table) to
>this to try and make sure
>> all
>> > threads will get the majority of selected data
>from cache, rather than
>> > each going to disk. But no better.
>> >
>> > Tried many, many other combinations
>of hints, but can't seem
>> > to get more than "almost" double speed of a
>SELECT * from TABLE single
>> > process approach - using about 5 threads. Even
>when there's loads of
>> free
>> > oracle/CPU resource available.
>> >
>> > Any ideas?!?!
>> >
>> > Dave
>> >
>>

-- 
Please see the official ORACLE-L FAQ: http://www.orafaq.com
-- 
Author: Stephane Faroul
  INET: sfaroult_at_oriolecorp.com

Fat City Network Services    -- (858) 538-5051  FAX: (858) 538-5051
San Diego, California        -- Public Internet access / Mailing Lists
--------------------------------------------------------------------
To REMOVE yourself from this mailing list, send an E-Mail message
to: ListGuru_at_fatcity.com (note EXACT spelling of 'ListGuru') and in
the message BODY, include a line containing: UNSUB ORACLE-L
(or the name of mailing list you want to be removed from).  You may
also send the HELP command for other information (like subscribing).

Received on Fri May 31 2002 - 06:08:21 CDT