Re: exadata write performance problems

From: Andy Wattenhofer <watt0012_at_umn.edu>
Date: Thu, 14 Feb 2019 13:52:16 -0600
Message-ID: <CAFU3ey77yOKhvz=H0OHTe13WAsJCVG3rv0d8hMsVEgV8dQUOoA_at_mail.gmail.com>



Maybe it's worth looking further into that controlfile parallel write wait event. Run an AWR report for a time range where the problem exhibits, and look for "IOStat by File Type." You can see control file reads and writes there. If the writes are significantly high, you could cut that number in half by simply dropping to a single control file. I know that sounds like a blasphemous thing to do, but it's actually Oracle's best practice recommendation for your configuration to run just one control file on the +DATA disk group (reference MOS doc id 2062068.1 <https://support.oracle.com/epmos/faces/DocContentDisplay?id=2062068.1>). With heavy DML periods you've probably got a lot of redo I/O on the +FRA disk group already. Maybe as an experiment you can temporarily drop the +FRA control file and see if it alleviates the wait problem.

Andy

On Tue, Feb 12, 2019 at 5:16 PM Ls Cheng <exriscer_at_gmail.com> wrote:

> Hi
>
> Running Exadata 18.1.5.0.0. Grid 12.2.0.1, RDBMS 11.2.0.4 and 12.2.0.1.
> IORM is default, no custom configuration. Contrlfile set to DATA and FRA
> Disk Group and finally ASM High Redundanc
>
>
> Thanks
>
>
> On Tue, Feb 12, 2019 at 11:21 PM Andy Wattenhofer <watt0012_at_umn.edu>
> wrote:
>
>> Which Exadata software version are you running? Which grid and database
>> versions? Are you using IORM? What is your control_files parameter set to
>> (i.e., where are your control files)? And what are your ASM redundancy
>> levels for each of the disk groups?
>>
>> On Tue, Feb 12, 2019 at 3:16 PM Ls Cheng <exriscer_at_gmail.com> wrote:
>>
>>> Hi
>>>
>>> IHAC who has 1/8 Exadata x6-2 with High Capacity Disks is having heavy
>>> performance problems whenever some massive DML operation kicks in, since
>>> this is a 1/8 configuration the IOPS supporting write operation is not
>>> high, roughly 1200 IOPS. I am seeing as high as 4000 Physical Writes Per
>>> Sec in peak time. When this happens the user session starts suffering
>>> because they are blocked by enq: KO - fast object checkpoint which is
>>> blocked by "control file parallel write" by CKPT. So the idea is aliviate
>>> CKPT. This is from hist ash
>>>
>>> INSTANCE_NUMBER SAMPLE_TIME EVENT
>>> TIME_WAITED SESSION P1 P2 P3
>>> --------------- --------------------------------
>>> ------------------------------ ----------- ------- ---------- ----------
>>> ----------
>>> 2 12-FEB-19 12.11.24.540 AM control file parallel
>>> write 1110465 WAITING 2 41 2
>>> 2 12-FEB-19 12.16.34.754 AM Disk file Mirror Read
>>> 1279827 WAITING 0 1 1
>>> 1 12-FEB-19 12.16.44.012 AM control file parallel
>>> write 1820977 WAITING 2 39 2
>>> 2 12-FEB-19 12.20.34.927 AM control file parallel
>>> write 1031042 WAITING 2 856 2
>>> 1 12-FEB-19 12.21.14.256 AM control file parallel
>>> write 1905266 WAITING 2 3 2
>>> 2 12-FEB-19 12.21.14.977 AM control file parallel
>>> write 1175924 WAITING 2 42 2
>>> 1 12-FEB-19 12.21.54.301 AM control file parallel
>>> write 2164743 WAITING 2 855 2
>>> 2 12-FEB-19 12.22.35.036 AM control file parallel
>>> write 1581684 WAITING 2 4 2
>>> 1 12-FEB-19 12.23.44.381 AM control file parallel
>>> write 1117994 WAITING 2 3 2
>>> 1 12-FEB-19 12.23.54.404 AM control file parallel
>>> write 4718841 WAITING 2 3 2
>>>
>>> Whe this happens we observe these cell metrics
>>>
>>> CELL METRICS SUMMARY
>>>
>>> Cell Total Flash Cache: IOPS=13712.233 Space allocated=6083152MB
>>> == Flash Device ==
>>> Cell Total Utilization: Small=27.8% Large=14.2%
>>> Cell Total Throughput: MBPS=471.205
>>> Cell Total Small I/Os: IOPS=9960
>>> Cell Total Large I/Os: IOPS=6005
>>>
>>> == Hard Disk ==
>>> Cell Total Utilization: Small=69.5% Large=18.7%
>>> Cell Total Throughput: MBPS=161.05
>>> Cell Total Small I/Os: IOPS=5413.618
>>> Cell Total Large I/Os: IOPS=166.2
>>> Cell Avg small read latency: 245.67 ms
>>> Cell Avg small write latency: 62.64 ms
>>> Cell Avg large read latency: 308.99 ms
>>> Cell Avg large write latency: 24.65 ms
>>>
>>>
>>> We cannot not enable write-back flash cache right now because that may
>>> cause another problems and although we are in process to upgrade 1/8 cells
>>> to 1/4 cells it is going to take some months. I know it is not a best
>>> practice but I was thinking in the mean time scarve some flash space and
>>> create them as grid disk and store the controlfiles in Flash. Anyone have
>>> experience with such setup?
>>>
>>> TIA
>>>
>>>
>>>

--
http://www.freelists.org/webpage/oracle-l
Received on Thu Feb 14 2019 - 20:52:16 CET

Original text of this message