Re: switching to 'direct path read' should help prevent thrashing buffer cache ?

Home -> Community -> Usenet -> c.d.o.server -> Re: switching to 'direct path read' should help prevent thrashing buffer cache ?

From: <p.santos000_at_gmail.com>
Date: 21 Dec 2006 10:32:49 -0800
Message-ID: <1166725969.230627.255110@a3g2000cwd.googlegroups.com>

Jonathan Lewis wrote:
> <p.santos000_at_gmail.com> wrote in message
> news:1166653299.575962.3830_at_48g2000cwx.googlegroups.com...
> > Folks,
> > Our environment is neither an OLTP nor a DSS system. We have
> > transactions that cross boundaries.
> >
> > Particularly we have these one of jobs that many of our customers run
> > at any time during the
> > day in which they chose to clear a table column.
> > Most clients clear anywhere from 1 million rows to 11 million rows
> > per job.
> >
> > The pseudo-code and SQL looks like this:
> >
> > while SQL%ROWCOUNT < 500,000
> > do
> > UPDATE DEMO_<CLIENT_ID>
> > set col1= null,
> > col2= null,
> > col3= null
> > WHERE ( col1 is not null OR
> > col2 is not null OR
> > col3 is not null ) AND ROWNUM <= 500,000;
> > commit;
> > done.
> >
> > We use a ROWNUM limit in order to prevent row blocking for other
> > processes that
> > might be processing single row dml against the table ..
> >
> > We have increased the efficiency of these processes .. making IO
> > faster and now customers
> > are just doing it more often. ... this obviously thrashes my buffer
> > cache.
> >
> > Nearly all updates spend most of their time waiting on 'db file
> > scattered read'.
> > We have db_file_multiblock_read_count = 128.
> >
> > Should also mention that this process connects via a shared/mts
> > connection... although
> > we can change that if needed.
> >
> > I'm thinking about having just this process run in parallel in order to
> > bypass the buffer cache because, I don't believe this process benefits
> > from caching and it causes blocks to age out faster for other clients
> > that are doing other things .. and do benefit from caching.
> >
> > My thought is that if I switch this to a dedicated connection and I
> > add a PARALLEL hint
> > ( even if it's just 2 parallel servers per job), the job will
> > complete faster, it will prevent my cache from being thrashed only at
> > the cost of more pga memory , and a little bit more io.
> >
> > I'm looking for the cons in doing something like this?
> >

>
>

> How big is the whole table that's being processed ?
> How tightly co-located is the data set to be updated ?

> If you can run into locking issues doing this in a single
> update, what does it mean for the business logic that
> one process can (apparently) update rows that another
> process is apparently trying to put out of bounds ?

> Parallel probably won't help - large tablescans (and
> I assume this table is larger than 2% of the size of the
> db_cache - but you could try setting it to NOCACHE)
> use only the tail-end of the buffer, so tend not to do much
> damage to the cache. Moreover, a parallel update will
> lock the table - and you are committing regularly to try
> and avoid locking issues.

>
>

> If the few million rows represent a small fraction of the
> data, with a good clustering effect, then creating a function
> based index to identify them, and updating through the index
> might help.

> Otherwise, take out the commit - you are repeatedly re-reading
> the blocks that you have just changed, introducing extra costs
> of cleanout, and excessive scattered reads (possibly assisted by
> a file-system or disk cache, but still wasting resources.)

>
>
>
>

> --
> Regards

> Jonathan Lewis
> http://jonathanlewis.wordpress.com

> Author: Cost Based Oracle: Fundamentals
> http://www.jlcomp.demon.co.uk/cbo_book/ind_book.html

> The Co-operative Oracle Users' FAQ
> http://www.jlcomp.demon.co.uk/faq/ind_faq.html

Jonathan,
I'm not sure how parallel updating the table will not be faster. If I have to update 500K records on a table, I can do it much faster if
I do it in parallel (degree 2). We don't have any parallel operations in
our system at this time, but when a customer wants a "rush" job, this is
how I do it for them.

The particular customer that I'm investigating has a 10GB table, and about 150 columns. 50 of those columns are varchar2(1) columns that the customer "resets" or clears every week... don't ask .. I've tried to get
them to increment the flag by 1 each week, but they don't want to .. they
would rather reset the column to NULL.

I'm afraid that NOCACHE would impair other processes that do benefit from caching. Also removing the commit between updates causes rows to be blocked for too long .. and this causes real problems in our system.
Currently this customer clears 9 million records in about 8 hrs... so having
rows locked that long is not possible.

Does a parallel update cause more locking that the same update with parallelization? I don't think so.

This is the big picture

Some customers will clear or set to NULL a number of "flag" columns. These
are all VARCHAR2(1). Then they will upload their customer lists and have
our "load" process set a 1 or more columns to "1" .. to flag the user. This load process
means looking up a single record etc .. and updating it by ROWID. A load job
can have 5-10 million users.

Then they will run about 50-200 queries during the rest of the week where they select users based on any WHERE clause combination .. including the
"flag" that they just set...

Finally, they clear/reset the columns to NULL and repeat the update/select/clear
process on a weekly basis.

  It's a touch workload to manage, but I have no choice but to make this system
  work until 2008 .. when we redesign our application to no do these kinds of
  transactions all on the same system.

-peter Received on Thu Dec 21 2006 - 12:32:49 CST