Re: High Number of undo writes

From: Purav Chovatia <puravc_at_gmail.com>
Date: Thu, 17 May 2012 16:05:38 +0530
Message-ID: <CADrzpjH5FGhDc97Z52tKwu-==Fn=E+i5PDKjRzmjbv5_T0W_Fg_at_mail.gmail.com>



So we installed 10.2.0.4 (because could not achieve same memory params as setup A, mainly log_buffer. Pls let me know if there is a way to have exactly same params for 2 setups) on the same hardware as setup B and lets refer to it as B1.
Then we set the memory same (entire pfile) exactly same as A including the double underscore params. But when the instance came up, neither of db_buffer_cache, shared_pool_size, log_buffer, was exactly same as A. buffer cache: A=3536M, B=3488M, B1=3488M shared pool: B=496M, B=560M, B1=560M
log buffer: A=6336k, B=14328k, B1=6536k
So as we see log buffer was now very close to A.

With the above, when we tested, the data writes reduced drastically coming very close to A i.e. it reduced from 1000-1500% to 30% times.

Since it was still more by 30%, we turned off ASMM and set the values for buffer cache and shared pool to exactly same as A and tweaked log buffer a few times such that eventually it was 6332k (difference of only 4k). With this setting, we did another test and the writes were almost identical to A. Infact the writes, surprisingly, reduced to 90% of those in A (disk %busy was more but that could be because there are only 10 disks in B whereas there are 12 in A).

So can you please help me understand a few things: 1. Why should the memory params change so much for the same values of SGA with ASMM on 2 very similar setups. Is that the difference between 10.2.0.4 and 10.2.0.5?

2. What could have caused the DBWR to be so aggressive? The impact, on storage, was huge because the writes were 10-15 fold causing the storage subsystem to be comparatively much more saturated.

- Was it the small difference of 48M in buffer cache
- Was is the big difference of 7992k in log buffer
- Was it 10.2.0.4 vs 10.2.0.5?
- Was it something else?

3. redo writes are still half of those in A and total redo written (for the same workload) is still higher by approx. 35%. How do I troubleshoot this?

Thanks

On Wed, May 16, 2012 at 6:31 PM, Purav Chovatia <puravc_at_gmail.com> wrote:

> Interestingly, on MOS, *604351.1* note explains the log buffer
> calculation and it states: Applies to 10.2.0.1 to 10.2.0.4. Although, there
> are other notes explaining the same concept that do NOT restrict the
> understanding to 10.2.0.4 and instead state, Applies to 10.2.0.1 and later.
>
> Regards
>
> On Wed, May 16, 2012 at 6:28 PM, Purav Chovatia <puravc_at_gmail.com> wrote:
>
>> Thanks Subodh.
>>
>> But I am unable to do so - mainly not able to reduce log_buffer to make
>> it same as A. Tried various values of sga components to see if Oracle,
>> while calculating the log_buffer internally, get a value same or close to
>> the value as in A which is ~6MB) but nothing worked. So I am in the process
>> of installing 10.2.0.4. This will also set the compatibility to 10.2.0.3
>> making it same as A (currently its 10.2.0.3 in A and 10.2.0.5 in B).
>>
>> Regards
>>
>>
>> On Wed, May 16, 2012 at 2:27 PM, Subodh Deshpande <
>> deshpande.subodh_at_gmail.com> wrote:
>>
>>> make all sga components in size exactly the same or in proportion and
>>> letus know the findings..
>>>
>>> On 16 May 2012 13:44, Purav Chovatia <puravc_at_gmail.com> wrote:
>>>
>>>> Hi,
>>>> More details about the setup and observations:
>>>> SetupA / SetupB:
>>>> DB server: Intel 2.80GHz / 3.33 GHz
>>>> Oracle: 10.2.0.4 / 10.2.0.5
>>>> Storage: HP P2000 G3 / Sun 2540 M2
>>>> Volume data: 12x146 RAID10 / 10x300 RAID 10
>>>> Volume redo: 2x146 RAID1 / 2x300 RAID1
>>>>
>>>> SGA and all other memory params are same or very close except log buffer
>>>> (because it is a derived value) at 6M in A and 14M in B.
>>>>
>>>> The observation is that the data volume is experiencing 10-15 times more
>>>> IOPS in B as compared to A. Total data written in A and B is same. And
>>>> hence the avg IO size in B is 1/10th of that in A. So I looked at the
>>>> AWR
>>>> report to check if Oracle indeed do more IOPS or was it something else
>>>> and
>>>> that is when I found that undo writes are far more in B. And this
>>>> behaviour
>>>> is consistent in 5 runs of 1 hour each done till now. So dont understand
>>>> why should Oracle do more writes for the same amount of changes.
>>>> In this context, I observed the foll. but cant make sense out of it (A
>>>> / B):
>>>> physical write IO requests 464,606 / 3,315,359
>>>> physical write total IO requests 14,271,435 / 10,307,912
>>>> physical write total multi block requests 13,844,347 / 7,311,634
>>>> physical writes 5,050,624 / 4,993,546
>>>> So as we see, the major difference is in physical write IO requests but
>>>> I
>>>> cant understand why? If we see the next 2 stats, then IOPS in B should
>>>> have
>>>> been less as compared to A.
>>>>
>>>> Another observation is that the redo volume is experiencing half the
>>>> IOPS
>>>> in B as compared to A. Redo per sec in B is 35% more than that in A and
>>>> that is very surprising. For a moment I thought that it could be
>>>> because of
>>>> redo wastage but that is not so. And rightly so because the IOPS (i.e.
>>>> the
>>>> redo writes on disk) have actually halved and redo wastage would
>>>> increase
>>>> only if there are more writes. [the redo writes in this sentence is not
>>>> the
>>>> "redo write" statistics of AWR report. I am referring to the writes on
>>>> the
>>>> redo volume].
>>>>
>>>> I was expecting similar performance on both setups because the
>>>> characteristics of the storage that matter are largely same i.e. both
>>>> storage have dual controller with both volumes having write cache
>>>> enabled.
>>>> The RAID configuration is exactly the same. Disks are of same speed and
>>>> type (make is different).
>>>>
>>>> Thanks.
>>>>
>>>> On Tue, May 15, 2012 at 9:10 PM, Purav Chovatia <puravc_at_gmail.com>
>>>> wrote:
>>>>
>>>> > Hi,
>>>> >
>>>> > We are running a workload on 2 different setups to compare
>>>> performance.
>>>> > And we have some very unusual observations for exactly the same
>>>> workload.
>>>> > There are a few differences between the 2 setups, but none of those
>>>> are
>>>> > able to justify the difference.
>>>> >
>>>> > I will provide more details later (sorry but got to leave), but one
>>>> major
>>>> > difference is that setupA runs 10.2.0.4 whereas setupB runs 10.2.0.5.
>>>> And
>>>> > the observation is that in B, the no. of writes to undo tablespace is
>>>> > comparatively very high (Av writes/s is 57 in A and 854 in B). Is this
>>>> > observed previously by anybody? There are other unusual differences
>>>> but
>>>> > those seem to be because of these high undo writes.
>>>> >
>>>> > Thanks.
>>>> >
>>>>
>>>>
>>>> --
>>>> http://www.freelists.org/webpage/oracle-l
>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> =============================================
>>> Love me or Hate me both are in my Favour.
>>> Love me, I am in your Heart. Hate me, I am in your Mind.
>>> =============================================
>>>
>>>
>>
>

--
http://www.freelists.org/webpage/oracle-l
Received on Thu May 17 2012 - 05:35:38 CDT

Original text of this message