Re: exadata write performance problems

From: Ls Cheng <exriscer_at_gmail.com>
Date: Sun, 17 Feb 2019 16:48:24 +0100
Message-ID: <CAJ2-Qb-w6Bou03ujLq9zTTH1UHY3KHqyt4WV=MJeaFqoMxH29A_at_mail.gmail.com>



Hi

Since you have referred Tanel Poder's blog I think our problem is similar to what he described in
https://blog.tanelpoder.com/posts/log-file-switch-checkpoint-incomplete-and-lgwr-waiting-for-checkpoint-progress/ because when we suffer write problems we also observe many checkpoint incomplete events.

On Fri, Feb 15, 2019 at 1:29 PM Kuba Szepietowski < kuba.szepietowski_at_gmail.com> wrote:

> Hi,
>
> "I also obseve something strange, a 128 KB table with 50 rows is being
> Smart Scanned in the cells when the buffer cache is 25GB. This also
> increase checkpoint activity as well."
>
> Is there any undocumented parameter set on instance level like
> _serial_direct_read? more on that here:
> https://blog.tanelpoder.com/2013/05/29/forcing-smart-scans-on-exadata-is-_serial_direct_read-parameter-safe-to-use-in-production/
>
> best regards
> Jakub
>
>
> On Thu, Feb 14, 2019 at 8:53 PM Andy Wattenhofer <watt0012_at_umn.edu> wrote:
>
>> Maybe it's worth looking further into that controlfile parallel write
>> wait event. Run an AWR report for a time range where the problem exhibits,
>> and look for "IOStat by File Type." You can see control file reads and
>> writes there. If the writes are significantly high, you could cut that
>> number in half by simply dropping to a single control file. I know that
>> sounds like a blasphemous thing to do, but it's actually Oracle's best
>> practice recommendation for your configuration to run just one control file
>> on the +DATA disk group (reference MOS doc id 2062068.1
>> <https://support.oracle.com/epmos/faces/DocContentDisplay?id=2062068.1>).
>> With heavy DML periods you've probably got a lot of redo I/O on the +FRA
>> disk group already. Maybe as an experiment you can temporarily drop the
>> +FRA control file and see if it alleviates the wait problem.
>>
>> Andy
>>
>> On Tue, Feb 12, 2019 at 5:16 PM Ls Cheng <exriscer_at_gmail.com> wrote:
>>
>>> Hi
>>>
>>> Running Exadata 18.1.5.0.0. Grid 12.2.0.1, RDBMS 11.2.0.4 and 12.2.0.1.
>>> IORM is default, no custom configuration. Contrlfile set to DATA and FRA
>>> Disk Group and finally ASM High Redundanc
>>>
>>>
>>> Thanks
>>>
>>>
>>> On Tue, Feb 12, 2019 at 11:21 PM Andy Wattenhofer <watt0012_at_umn.edu>
>>> wrote:
>>>
>>>> Which Exadata software version are you running? Which grid and database
>>>> versions? Are you using IORM? What is your control_files parameter set to
>>>> (i.e., where are your control files)? And what are your ASM redundancy
>>>> levels for each of the disk groups?
>>>>
>>>> On Tue, Feb 12, 2019 at 3:16 PM Ls Cheng <exriscer_at_gmail.com> wrote:
>>>>
>>>>> Hi
>>>>>
>>>>> IHAC who has 1/8 Exadata x6-2 with High Capacity Disks is having heavy
>>>>> performance problems whenever some massive DML operation kicks in, since
>>>>> this is a 1/8 configuration the IOPS supporting write operation is not
>>>>> high, roughly 1200 IOPS. I am seeing as high as 4000 Physical Writes Per
>>>>> Sec in peak time. When this happens the user session starts suffering
>>>>> because they are blocked by enq: KO - fast object checkpoint which is
>>>>> blocked by "control file parallel write" by CKPT. So the idea is aliviate
>>>>> CKPT. This is from hist ash
>>>>>
>>>>> INSTANCE_NUMBER SAMPLE_TIME EVENT
>>>>> TIME_WAITED SESSION P1 P2 P3
>>>>> --------------- --------------------------------
>>>>> ------------------------------ ----------- ------- ---------- ----------
>>>>> ----------
>>>>> 2 12-FEB-19 12.11.24.540 AM control file parallel
>>>>> write 1110465 WAITING 2 41 2
>>>>> 2 12-FEB-19 12.16.34.754 AM Disk file Mirror
>>>>> Read 1279827 WAITING 0 1 1
>>>>> 1 12-FEB-19 12.16.44.012 AM control file parallel
>>>>> write 1820977 WAITING 2 39 2
>>>>> 2 12-FEB-19 12.20.34.927 AM control file parallel
>>>>> write 1031042 WAITING 2 856 2
>>>>> 1 12-FEB-19 12.21.14.256 AM control file parallel
>>>>> write 1905266 WAITING 2 3 2
>>>>> 2 12-FEB-19 12.21.14.977 AM control file parallel
>>>>> write 1175924 WAITING 2 42 2
>>>>> 1 12-FEB-19 12.21.54.301 AM control file parallel
>>>>> write 2164743 WAITING 2 855 2
>>>>> 2 12-FEB-19 12.22.35.036 AM control file parallel
>>>>> write 1581684 WAITING 2 4 2
>>>>> 1 12-FEB-19 12.23.44.381 AM control file parallel
>>>>> write 1117994 WAITING 2 3 2
>>>>> 1 12-FEB-19 12.23.54.404 AM control file parallel
>>>>> write 4718841 WAITING 2 3 2
>>>>>
>>>>> Whe this happens we observe these cell metrics
>>>>>
>>>>> CELL METRICS SUMMARY
>>>>>
>>>>> Cell Total Flash Cache: IOPS=13712.233 Space
>>>>> allocated=6083152MB
>>>>> == Flash Device ==
>>>>> Cell Total Utilization: Small=27.8% Large=14.2%
>>>>> Cell Total Throughput: MBPS=471.205
>>>>> Cell Total Small I/Os: IOPS=9960
>>>>> Cell Total Large I/Os: IOPS=6005
>>>>>
>>>>> == Hard Disk ==
>>>>> Cell Total Utilization: Small=69.5% Large=18.7%
>>>>> Cell Total Throughput: MBPS=161.05
>>>>> Cell Total Small I/Os: IOPS=5413.618
>>>>> Cell Total Large I/Os: IOPS=166.2
>>>>> Cell Avg small read latency: 245.67 ms
>>>>> Cell Avg small write latency: 62.64 ms
>>>>> Cell Avg large read latency: 308.99 ms
>>>>> Cell Avg large write latency: 24.65 ms
>>>>>
>>>>>
>>>>> We cannot not enable write-back flash cache right now because that may
>>>>> cause another problems and although we are in process to upgrade 1/8 cells
>>>>> to 1/4 cells it is going to take some months. I know it is not a best
>>>>> practice but I was thinking in the mean time scarve some flash space and
>>>>> create them as grid disk and store the controlfiles in Flash. Anyone have
>>>>> experience with such setup?
>>>>>
>>>>> TIA
>>>>>
>>>>>
>>>>>
>
> --
> Mit freundlichen Grüße/Best Regards/Pozdrawiam
> Jakub Szepietowski
>
> **************************************
> Jakub Szepietowski
> Berlinerstr. 4, 65824 Schwalbach am Taunus
> Tel: +49 (0)152 31070846
> <http://www.xing.com/profile/Jakub_Szepietowski>
> <https://de.linkedin.com/pub/jakub-szepietowski/68/a47/11a>
>
>
>

--
http://www.freelists.org/webpage/oracle-l
Received on Sun Feb 17 2019 - 16:48:24 CET

Original text of this message