Re: Checkpoints - a bottleneck. Asking for tuning advise

From: Jonathan Lewis <jonathan_at_jlcomp.demon.co.uk>
Date: 2000/03/12
Message-ID: <952896445.27966.0.nnrp-03.9e984b29@news.demon.co.uk>

Why are you forcing a checkpoint every 20 seconds, it seems a bit extreme.

Why only two rollback segments when you have 7 CPUs - are there only a couple of session ever connected to the database doing all this work (given your waitstats though, this doesn't seem to have caused problems).

The average wait time on events doesn't really tell us a lot without number of waits.

Your init.ora parameters suggest that there are other features of the system that you have not told us about - what side-effects might they have.

A couple of suggestions though:

Increase the checkpoint timout, and use the DB_BLOCK_MAX_DIRTY_TARGET to keep
a stream of dirty blocks running out to the database rather than hitting everything in one go.

Have a look at your v$filestats and figure out which files are getting most writes. If it is the rollback files, then think about making the rollback segments as small
as possible.

--

Jonathan Lewis
Yet another Oracle-related web site:  http://www.jlcomp.demon.co.uk

Brian Rasmusson wrote in message <38CC014C.1ACD90E_at_bellesystems.com>...


>Dear all,

>

>we're at the moment doing some benchmarking of a real-time application

>making heavy amounts of inserts and updates. We are talking hundreds of

>each per second, and we would do quite good if it wasn't for our

>problems

>with checkpoints taking too long.

>

>

>Brief application overview

>--------------------------

>A session is considered to contain a "state query" and a "session

>complete" event. There is no way to combine the two, as they are

>actually

>two transactions in the system. The system is a real-time system,

>meaning

>that requests must be handled instantly.

>

>The "state query" event looks up data from a table consisting of 4

>columns,

>to simplify lets call it the STATE table in the rest of the mail.

>

>The "session complete" event updates data in the STATE table, and

>inserts

>data in the COMPLETE table and in the LOG table. The COMPLETE table

>contains approx. 15 columns, and the LOG table approx. 8 columns.

>

>On a Sun Enterprise 4500 with 6 400Mhz CPUs, Solaris 2.7, Oracle 8.0.5,

>26 18Gb disks and 3Gb RAM we have reached about 220 sessions per second

>(that is, 220*"state query" + 220*"session complete" events).

>

>STATE table contains 5.000.000 rows.

>

>COMPLETE and LOG tables contain 1 row per session. They cannot be

>combined

>to 1 table.

>

>

>So what's the problem?

>----------------------

>"220 sessions sounds quite good!" I hear you say. Yes, it's pretty good,

>but

>not good enough. Our application can handle much more, I/O wait on data

>disks

>is low. The bottleneck seems to be checkpoints. I/O goes up heavily, and

>ofcourse this stalls the rest of the system for a few seconds. Not much,

>but

>way too much when you are running a real-time system.

>

>We are forced to stick with Oracle 8.0.5 for now.

>

>

>What have we done so far?

>-------------------------

>Quite a lot. First we ofcourse optimised our application to use the

>database

>as little as possible, meaning implementing our own caches etc. What we

>have

>done more includes:

>

>1) The STATE table is kept in buffer pool (which btw. is 800Mb)

>

>2) Checkpoints are set to take place every 20 seconds

>

>3) STATE table is partitioned across 5 disks (1.000.000 rows per

>partition)

>

>4) STATE table has no primary key, but a UNIQUE LOCAL index partitioned

>   across 5 disks.

>

>5) COMPLETE table is partitioned across 5 disks

>

>6) COMPLETE table has no primary key, but a UNIQUE LOCAL index

>partitioned

>   across 5 disks

>

>7) LOG table stores data in partitions that each contains 1 months data.

>Must

>   be kept like this. Partitioning of this table only benefits the

>database

>   maintenance, not very much the performance.

>

>8) Each partition has its own tablespace

>

>9) The two rollback segments each have their own tablespace and own

>disk.

>

>10) The redo logs are placed on two disks each with the forcedirectio

>setting

>    enabled in Solaris.

>

>11) One db_writer is enabled per CPU

>

>

>Dumps from the system

>---------------------

>

>initsid.ora settings:

>

>rollback_segments=(r1,r2)

>db_files        = 100

>open_cursors    = 256

>db_file_multiblock_read_count = 8

>db_block_buffers = 200000 ## shared mem 2 GIG

>buffer_pool_keep = 100000 ## 800M

>db_writer_processes  = 7

>db_block_lru_latches = 7

>shared_pool_size = 400M

>log_checkpoint_interval = 200000 ## every 50M do checkpoint

>log_checkpoint_timeout = 20 ## every 20sec

>processes = 300

>dml_locks = 500

>log_buffer = 262144

>log_small_entry_max_size=32 ##

>log_simultaneous_copies=7 ##

>sequence_cache_entries = 120

>sequence_cache_hash_buckets = 120

>max_dump_file_size = 204800

>timed_statistics = TRUE

>optimizer_mode = FIRST_ROWS

>global_names = false

># Sort

>sort_area_size = 1048576 # 1M

>sort_area_retained_size = 131072 # 128k

>sort_direct_writes = true

>sort_write_buffer_size = 32768

>sort_write_buffers = 1

># SNP background processes (job executers)

>job_queue_processes     = 2

>job_queue_interval      = 30

>

>

>v$system_event:

>

>EVENT

>AVERAGE_WAIT

>------------------------------

>------------

>Null event

>401

>SQL*Net break/reset to client

>.038461538

>SQL*Net message from client

>.444446547

>SQL*Net message to client

>.000453161

>SQL*Net more data from client

>.003768747

>SQL*Net more data to client

>0

>buffer busy waits

>.285714286

>control file parallel write

>.775605806

>control file sequential read

>.077059827

>db file parallel write

>12.0203044

>db file scattered read

>.332503556

>db file sequential read

>1.95008266

>direct path read

>.437058446

>direct path write

>.005569007

>enqueue

>3

>file identify

>.012211669

>file open

>.007784855

>instance state change

>0

>latch free

>1

>library cache pin

>5

>log buffer space

>13

>log file parallel write

>.273877557

>log file sequential read

>.682857143

>log file single write

>.178932179

>log file switch completion

>6.97971014

>log file sync

>.188444236

>pmon timer

>301.000819

>process startup

>11.7857143

>rdbms ipc message

>98.3071415

>rdbms ipc reply

>28.5929025

>refresh controlfile command

>.044117647

>smon timer

>28708.3673

>sort segment request

>101

>write complete waits

>12.1792846

>

>

>v$waitstat:

>

>CLASS                   COUNT

>TIME

>------------------ ----------

>----------

>data block                  4

>1

>sort block                  0

>0

>save undo block             0

>0

>segment header              0

>0

>save undo header            0

>0

>free list                   0

>0

>extent map                  0

>0

>bitmap block                0

>0

>bitmap index block          0

>0

>unused                      0

>0

>system undo header          0

>0

>system undo block           0

>0

>undo header                 3

>1

>undo block                  0

>0

>

>

>We also tried the hint about letting another process keep the logfiles

>open, but it seemed to have no effect at all?

>

>Any input on how to improve the checkpoint performance will be greatly

>appreciated. More info can be provided if required.

>

>Thanks in advance,

>Brian

Received on Sun Mar 12 2000 - 00:00:00 CST