Re: Performance of Multi Stream Analyze Commands over large Schema with partitioned objects

Home -> Community -> Usenet -> c.d.o.server -> Re: Performance of Multi Stream Analyze Commands over large Schema with partitioned objects

From: Connor McDonald <connor_mcdonald_at_yahoo.com>
Date: Fri, 07 Jun 2002 20:54:43 +0100
Message-ID: <3D010F83.3905@yahoo.com>

John O'Brien wrote:
>
> Thanks everybody for the replies:
>
> So I have a script that I execute like:
>
> analyze_db -n 24 ( howver many streams I want + other parameters too)
>
> Which will divide all the objects into 24 streams to be analyzed in
> parallel.
>
> Somebody else mentioned I should try dbms_stats and but we've tried
> and that does not cut it. I do not want to run parallel servers and
> even if I did the parallelism is fixed at whatever the init.ora
> parameter is set to. I also don't think it is as scalable as my
> solution which right now untuned is as good as anything dbms_stats can
> do, better in tests actually.
>
> I'm going to crank up sort area size and see what affect that has.
>
> I will got to faking stats if it is not possible to tune this Oracle
> administrative task but I really want to try an tune this analyze. I
> will have to build a configuration table around the utility to hold
> the fake info.
>
> My waitio is at 50% and I have a high end disk system with GBs of
> cache and fibre. Anyone got the algm for analyze. How much has to be
> in memory , how much is read.
>
> Maybe I got to go get my filelayout an analyze objects in alternating
> tablespaces so the io load is not going to the same tablespaces and
> the same disk. dbms_stats has to have this issue and you can't do
> this. That will probably help and it is probably a good thing for
> installations where I won't have the layout. BTW my db is tuned in
> terms of extents per segment etc.
>
> About using statistics from older partitions to populate , my data can
> vary and I could get burnt.
>
> > - Split out the segments by object_id and schedule multiple dbms_job
> > requests to schedule them. Then it just a case of playing with
> > job_queue_processes until you get the right mix of throughput versus
> > server load
>
> Doing that, and the server load is very high, thats what I want to
> reduce. It really is hard on the i/o. What is the algorithm analyze
> uses ?
>
> >
> > - omit column level stats (or variant thereof - eg only indexed
> > columns), typically these do the most work due to the massive sorting
> > involved
>
> You recomend dbms_stats over analyze even if I use it single stream
> from my perl script?
>
> Others must have this beef with Oracle.
>
> Best regards,
> John
>
> Connor McDonald <connor_mcdonald_at_yahoo.com> wrote in message news:<3CFFD37A.2B51_at_yahoo.com>...
> > John O'Brien wrote:
> > >
> > > Question:
> > > To sum up, given a high processor count SMP or NUMA box how can an
> > > Oracle database be analyzed fast ?
> > >
> > > Explanation and results:
> > >
> > > I have to analyze a large database weekly and sometimes daily. There
> > > are partitioned objects.
> > >
> > > Running analyze in a single script takes way too long, no matter what
> > > the sample rate. On 8.1.7 dbms_stats can do it in parallel but with
> > > parallel servers, which I do not want to switch on as they can have an
> > > adverse effect in my scenario. Anyway even that aside when a colleague
> > > tested it he did not see big gains. The number of parallel query
> > > servers can only be changed by bouncing the DB which I cannot do
> > > either - not a solution outside of R&D.
> > >
> > > I want a solution that works for 4 to 32 cpus at least and I have the
> > > hardware to test it.
> > >
> > > I wrote a perl script that extracts all the objects by partition out
> > > of the database and forks off a number of streams as specified in a
> > > parameter passed by the user. This means I can get all the paralelism
> > > I need with out a database reboot, unlike parallel query servers.
> > >
> > > The results were not impressive. Massive waitio and the buffer cache
> > > is maxed at over 2GB ( AIX ). This is 32 bit Oracle.
> > >
> > > I changed the db_file_multiblock_read_count to 2048 so it would assume
> > > the max value and saw the waitio drop off somewhat.
> > >
> > > Now aix also uses asynchronous io natively on JFS and I have 10 db
> > > writers running.
> > >
> > > Seems to me like the buffer cache and i/o is the throttle here but I
> > > would like opinions,advice and discussion.
> > >
> > > For instance should I make the script smarter so I have some logic
> > > that determines if an object needs to be analyzed because it has
> > > changed significantly. This could reduce i/o.
> > >
> > > Regards,
> > > John
> >
> > Some ideas
> >
> > - Split out the segments by object_id and schedule multiple dbms_job
> > requests to schedule them. Then it just a case of playing with
> > job_queue_processes until you get the right mix of throughput versus
> > server load
> >
> > - omit column level stats (or variant thereof - eg only indexed
> > columns), typically these do the most work due to the massive sorting
> > involved
> >
> > - Crank up sort size, see above
> >
> > - use the object size to determine an estimate percentage for each
> > object , so you compute for small stuff, estimate x% for large objects,
> > where (say) 1<x<30 depends on segment size
> >
> > - lie - if the stats are predictable for some objects eg 'we add 10000
> > rows per day', then use dbms_stats to directly modify the stats
> >
> > hth
> > connor

I prefer dbms_stats since the statistics are reportedly better - but one caveat is on some versions of Oracle an estimate on a partition results in a full table scan (all partitions) which obviously is hardly ideal.

If you whack a sql_trace on dbms_stats you can see what is being done, its fairly likely that analyze does something very similar. This is how I deduced on our server that column level sorting took the most load.

hth
connor

-- 
==============================
Connor McDonald

http://www.oracledba.co.uk

"Some days you're the pigeon, some days you're the statue..."

Received on Fri Jun 07 2002 - 14:54:43 CDT