Re: Performance of Multi Stream Analyze Commands over large Schema with partitioned objects

Home -> Community -> Usenet -> c.d.o.server -> Re: Performance of Multi Stream Analyze Commands over large Schema with partitioned objects

From: John O'Brien <jobrien_at_ieee.org>
Date: 7 Jun 2002 08:56:01 -0700
Message-ID: <24bad735.0206070756.7a09ef8d@posting.google.com>

Thanks everybody for the replies:

So I have a script that I execute like:

analyze_db -n 24 ( howver many streams I want + other parameters too)

Which will divide all the objects into 24 streams to be analyzed in parallel.

Somebody else mentioned I should try dbms_stats and but we've tried and that does not cut it. I do not want to run parallel servers and even if I did the parallelism is fixed at whatever the init.ora parameter is set to. I also don't think it is as scalable as my solution which right now untuned is as good as anything dbms_stats can do, better in tests actually.

I'm going to crank up sort area size and see what affect that has.

I will got to faking stats if it is not possible to tune this Oracle administrative task but I really want to try an tune this analyze. I will have to build a configuration table around the utility to hold the fake info.

My waitio is at 50% and I have a high end disk system with GBs of cache and fibre. Anyone got the algm for analyze. How much has to be in memory , how much is read.

Maybe I got to go get my filelayout an analyze objects in alternating tablespaces so the io load is not going to the same tablespaces and the same disk. dbms_stats has to have this issue and you can't do this. That will probably help and it is probably a good thing for installations where I won't have the layout. BTW my db is tuned in terms of extents per segment etc.

About using statistics from older partitions to populate , my data can vary and I could get burnt.

> - Split out the segments by object_id and schedule multiple dbms_job
> requests to schedule them. Then it just a case of playing with
> job_queue_processes until you get the right mix of throughput versus
> server load

Doing that, and the server load is very high, thats what I want to reduce. It really is hard on the i/o. What is the algorithm analyze uses ?

>
> - omit column level stats (or variant thereof - eg only indexed
> columns), typically these do the most work due to the massive sorting
> involved

You recomend dbms_stats over analyze even if I use it single stream from my perl script?

Others must have this beef with Oracle.

Best regards,
John

Connor McDonald <connor_mcdonald_at_yahoo.com> wrote in message news:<3CFFD37A.2B51_at_yahoo.com>...
> John O'Brien wrote:
> >
> > Question:
> > To sum up, given a high processor count SMP or NUMA box how can an
> > Oracle database be analyzed fast ?
> >
> > Explanation and results:
> >
> > I have to analyze a large database weekly and sometimes daily. There
> > are partitioned objects.
> >
> > Running analyze in a single script takes way too long, no matter what
> > the sample rate. On 8.1.7 dbms_stats can do it in parallel but with
> > parallel servers, which I do not want to switch on as they can have an
> > adverse effect in my scenario. Anyway even that aside when a colleague
> > tested it he did not see big gains. The number of parallel query
> > servers can only be changed by bouncing the DB which I cannot do
> > either - not a solution outside of R&D.
> >
> > I want a solution that works for 4 to 32 cpus at least and I have the
> > hardware to test it.
> >
> > I wrote a perl script that extracts all the objects by partition out
> > of the database and forks off a number of streams as specified in a
> > parameter passed by the user. This means I can get all the paralelism
> > I need with out a database reboot, unlike parallel query servers.
> >
> > The results were not impressive. Massive waitio and the buffer cache
> > is maxed at over 2GB ( AIX ). This is 32 bit Oracle.
> >
> > I changed the db_file_multiblock_read_count to 2048 so it would assume
> > the max value and saw the waitio drop off somewhat.
> >
> > Now aix also uses asynchronous io natively on JFS and I have 10 db
> > writers running.
> >
> > Seems to me like the buffer cache and i/o is the throttle here but I
> > would like opinions,advice and discussion.
> >
> > For instance should I make the script smarter so I have some logic
> > that determines if an object needs to be analyzed because it has
> > changed significantly. This could reduce i/o.
> >
> > Regards,
> > John
>
> Some ideas
>
> - Split out the segments by object_id and schedule multiple dbms_job
> requests to schedule them. Then it just a case of playing with
> job_queue_processes until you get the right mix of throughput versus
> server load
>
> - omit column level stats (or variant thereof - eg only indexed
> columns), typically these do the most work due to the massive sorting
> involved
>
> - Crank up sort size, see above
>
> - use the object size to determine an estimate percentage for each
> object , so you compute for small stuff, estimate x% for large objects,
> where (say) 1<x<30 depends on segment size
>
> - lie - if the stats are predictable for some objects eg 'we add 10000
> rows per day', then use dbms_stats to directly modify the stats
>
> hth
> connor
Received on Fri Jun 07 2002 - 10:56:01 CDT