Re: Question on Exadata X8 - IO

From: Rajesh Aialavajjala <r.aialavajjala_at_gmail.com>
Date: Mon, 15 Feb 2021 08:14:23 -0500
Message-ID: <CAGvtKv6g7f0n2Vyy5wd0HnBBVXH1HWBk1hrWFNkZ0Z3ckC+ZOQ_at_mail.gmail.com>



Lok,
To try and answer your question about the percentage/share of flash vs hard disk in the Exadata X8-2/X8M-2

HC Storage Cells are packaged with

12x 14 TB 7,200 RPM disks                              = 168 TB Raw     /
Hard Disk
4x 6.4TB NVMe PCIe 3.0 Flashcard               = 25.6 TB Raw   / Flash

So the ~166TB that Andy references is purely spinning drives/hard disks - you have another ~25TB of Flash.

If your organization is acquiring the X8M-2 hardware - you will add on 1.5 TB of PMEM (Persistent Memory) to this.

You may certainly add more storage cells to your environment - elastic configurations in Exadata is a common thing - expansions or starting with elastic configurations is often seen. You might want to consider expanding storage after you evaluate your new X8 configuration.

Your X5- 2 hardware has/had <the drive sizes did change mid-way through the X5-2 generation - they started out with 12x4 TB drives and that was doubled to the 8 TB drives>

4 PCI Flash cards each with 1.6 TB (raw) Exadata Smart Flash Cache 12x 8 TB 7,200 RPM High Capacity disks

So you are looking at quite an increase in raw storage capacity and Flash.

+1 to Shane's mention about not using NORMAL redundancy in a Production configuration - FLEX disk groups are permitted in Exadata but they are not widely used to the best of my understanding - the Oracle best practices recommend HIGH redundancy - in fact I do not think you can configure FLEX redundancy within OEDA as it falls outside best practices - so you would have to tear down/recreate the disk groups manually. Or customize the initial install prior to moving your database(s)

Thanks,

--Rajesh

On Mon, Feb 15, 2021 at 8:09 AM Shane Borden <dmarc-noreply_at_freelists.org> wrote:

> I think it's a mistake to go with NORMAL redundancy in a production
> system. I could probably understand the argument for a test system, but
> not production. How do you think a regular storage array is configured?
> Likely not with a normal redundancy scheme. Aside from all of the other
> things mentioned already, you are now also bound to offline patching or if
> you try to do rolling you run a risk being able to tolerate only 1 disk
> failure or 1 storage server failure. Not only does this protect against
> some sort of technical failure, but also the human factor that could occur
> during patching
>
> If you must consider normal redundancy, I would go with FLEX Diskgroups vs
> configuring the entire rack as normal redundancy. That way, if you must,
> then you can specify the redundancy at the table space level rather than
> the disk group level. Should you change your mind later, its a simple
> alter command to change the redundancy rather than tearing down the entire
> rack and rebuilding it.
>
>
> ---
>
> Thanks,
>
>
> Shane Borden
> sborden76_at_yahoo.com
>
> On Feb 15, 2021, at 1:55 AM, Lok P <loknath.73_at_gmail.com> wrote:
>
> Thanks much Andy.
>
> Yes we have an existing machine that is X5-2 half Rac. (It's basically a
> full RAC machine logically splitted to two half RAC and we have only this
> database hosted on this half RAC). (And we are currently ~150TB and keep
> growing so we're planning for NORMAL redundancy.)
>
> In current X5 I am seeing its ~80TB hard disk+6TB flash disk per storage
> cell. When you said *"The Exadata X8 and X8M storage cells have 14TB
> disks. With 12 per cell, that's 168TB *per cell*." *Does it mean the sum
> of flash+hard disk is ~168TB per cell? What is the percentage/share of
> flash disk and hard disk in that?
>
> Apart from current storage saturation, with regards to the IOPS issue in
> our current X5 system, I am seeing in OEM the flash IOPS is reaching to
> ~2000K for large reads and the max limit is showing somewhere near
> ~1.3milion. Overall IO utilization for flash disk is showing ~75%. Hard
> disk IO limit shows as ~20K and most of the time it looks to be both small
> reads and large reads are staying below this limit. The overall IO
> utilization is staying below ~30% for the hard disks.
>
> Just got to know from the infra team the X8 to which we are planning to
> move into is not extreme flash rather high capacity disk only(similar to
> what we have on current X5) , but considering more flash storage and hard
> disk storage in each of the 7 - storage cells, we are expecting the new X8
> will satisfy the current capacity crunch both wrt space and IOPS.
>
> And as you just mentioned in your explanation that adding more storage
> cells will also help in bumping up the capacity. So should we consider
> adding a few more new storage cells on and above half rac to make it 8 or 9
> storage cells in total and if this is standard practice in the exadata
> world?
>
> Regards
> Lok
>
> On Sat, Feb 13, 2021 at 9:00 PM Andy Wattenhofer <watt0012_at_umn.edu> wrote:
>
>> The Exadata X8 and X8M storage cells have 14TB disks. With 12 per cell,
>> that's 168TB *per cell*. You haven't mentioned which rack size your X5
>> machine is, but from the numbers you're showing it looks like maybe a half
>> rack. A half rack of X8M will come with 1PB of total disk, giving you over
>> 300TB of usable space to divide between your RECO and DATA disk groups if
>> you are using HIGH redundancy. That seems plenty for your 150TB database.
>> But if you need more, add another storage cell.
>>
>> As for performance degradation from using HIGH redundancy, you need to
>> consider that the additional work of that extra write is being taken on by
>> the storage cells. By definition the redundant block copies must go to
>> separate cells. NORMAL redundancy writes to two cells and HIGH goes to
>> three. In aggregate, each write will be as fast as your slowest cell. So
>> any difference in write performance is more a function of the total number
>> of cells you have to share the workload. That difference would be
>> diminished as you increase the number of cells in the cluster.
>>
>> And of course that difference would be mitigated by the write back cache
>> too because writes to the flash cache are faster than writes to disk.
>>
>> Honestly, I can't imagine that Oracle would sell you an Exadata machine
>> where any of this would be a problem for you. It would be so undersized
>> from the beginning that your problems with it would be much greater than
>> any marginal difference in write performance from using high redundancy.
>>
>> Andy
>>
>>
>> On Fri, Feb 12, 2021 at 10:31 AM Lok P <loknath.73_at_gmail.com> wrote:
>>
>>> Thanks Much.
>>>
>>> I got some doc but missed. below URL also pointing to high redundancy as
>>> requirement but may be its not compulsory as you stated.
>>>
>>> We have the size of our database i.e. (~150TB), so we were thinking to
>>> have some space reduction using double mirror rather triple mirroring. but
>>> i was not aware that the disk size itself is a lot bigger in X8 and as you
>>> stated in X8 we have a bigger size disk and so the mirroring will take a
>>> lot of time(in case of crash/failure) and thus HIGH redundancy is
>>> recommended. I think we have to relook into the same. Note- What I see in
>>> the current X-5 machine, we have ~6TB flash/storage server and ~80TB hard
>>> disk/storage server. Not sure what that is in case of X8 though.
>>>
>>> And another doubt i had was, Is it also true that the IOPS will be
>>> degraded by some percentage in case of tripple mirror as compared to double
>>> mirror as because it has to write one more additional copy of data block
>>> to flash/disk?
>>>
>>>
>>> https://stefanpanek.wordpress.com/2017/10/20/exadata-flash-cache-enabled-for-write-back/
>>>
>>> On Fri, Feb 12, 2021 at 8:52 PM Ghassan Salem <salem.ghassan_at_gmail.com>
>>> wrote:
>>>
>>>> Please, can you point to where you saw that write-back is only possible
>>>> with high-redundancy?
>>>> High redundancy is very much recommended with X8 due to the size of the
>>>> disks, and the time it takes to re-mirror in case of disk loss: if you're
>>>> in a normal redundancy, and you loose a disk, while re-mirroring is being
>>>> done, you don't have any second copy of that data, and so, if you loose
>>>> yet-another disk, you're in big trouble. With lower capacity disks, the
>>>> re-mirroring takes much less time, and so the risk is lower.
>>>>
>>>> regards
>>>>
>>>> On Fri, Feb 12, 2021 at 3:57 PM Lok P <loknath.73_at_gmail.com> wrote:
>>>>
>>>>> Basically i am seeing many doc stating triple mirroring is
>>>>> recommended with "write back flash cache" and some other stating write back
>>>>> flash cache is not possible without HIGH redundancy/triple mirroring. So
>>>>> there is a difference between these two statements because if we decide to
>>>>> go for NORMAL redundancy to save some space and to have some IO benefit(in
>>>>> terms of not writing one more additional data block copy). But we want to
>>>>> utilize the "write back flash cache" option to get benefits on write IOPS.
>>>>> And in this case if restriction is put in place for "High redundancy" we
>>>>> won't be able to do that. Please Correct me if my understanding is wrong?
>>>>>
>>>>> On Fri, Feb 12, 2021 at 12:53 PM Lok P <loknath.73_at_gmail.com> wrote:
>>>>>
>>>>>> Seeing in below doc which state its recommended to go for high
>>>>>> redundancy ASM disk group(i.e. triple mirroring) in case we are using write
>>>>>> back flash cache as because the data will be first written/stay in flash
>>>>>> cache and flushed to the disk later stage and in case of failure it has to
>>>>>> be recovered from mirror copy. But i am wondering , is this not possible
>>>>>> with double mirroring , will it not survive the data loss in case of
>>>>>> failure? Want to understand what is the suggested setup which will give
>>>>>> optimal space usage without compromising IOPS and data loss.
>>>>>>
>>>>>>
>>>>>> https://docs.oracle.com/en/engineered-systems/exadata-database-machine/sagug/exadata-storage-server-software-introduction.html#GUID-E10F7A58-2B07-472D-BF31-28D6D0201D53
>>>>>>
>>>>>> Regards
>>>>>> Lok
>>>>>>
>>>>>> On Fri, Feb 12, 2021 at 10:42 AM Lok P <loknath.73_at_gmail.com> wrote:
>>>>>>
>>>>>>> Hello Listers, We are moving from exadata X5 to X8 and there are
>>>>>>> multiple reasons behind it. Few of them are , 1)we are almost going to
>>>>>>> saturate the existing storage capacity(DB size reaching ~150TB) in current
>>>>>>> X5. 2)And the current IOPS on X5 is also reaching its max while the system
>>>>>>> works during its peak load.
>>>>>>>
>>>>>>> We are currently having HIGH redundancy(triple mirroring) for our
>>>>>>> existing X5 machines for DATA and RECO disk group and DBFS is kept as
>>>>>>> NORMAL redundancy(double mirroring). Now few folks raised questions on the
>>>>>>> impact on IOPS and storage space consumption, if we use double
>>>>>>> mirroring(NORMAl redundancy) vs triple mirroring(High redundancy) in the
>>>>>>> new X8 machine. I can see the benefit of double mirroring(Normal
>>>>>>> redundancy) being saved in storage space(around 1/3rd in terms of DATA and
>>>>>>> REDO copies), but then what is the risk wrt data loss, is it okay in a
>>>>>>> production system? (Note- We do use ZDLRA backup for taking the DB backup.
>>>>>>> And for disaster recovery we have active data guard physical standby in
>>>>>>> place which runs in read only mode).
>>>>>>>
>>>>>>> With regards to IOPS, we are going with default write back flash
>>>>>>> cache enabled here. Is it correct that with double mirroring we have to
>>>>>>> write/read into two places VS in triple mirroring we have to do that in
>>>>>>> three places , so there will also be degradation in IOPS with triple
>>>>>>> mirroring/High redundancy as compared to double mirroring? if it's true
>>>>>>> then by what percentage the IOPS degradation will be there? And then if
>>>>>>> it's okay if we go for double mirroring as that will benefit us wrt IOPS
>>>>>>> and also saves a good amount of storage space?
>>>>>>>
>>>>>>> Regards
>>>>>>>
>>>>>>> Lok
>>>>>>>
>>>>>>
>

--
http://www.freelists.org/webpage/oracle-l
Received on Mon Feb 15 2021 - 14:14:23 CET

Original text of this message