Re: Question on Exadata X8 - IO

From: Rajesh Aialavajjala <r.aialavajjala_at_gmail.com>
Date: Tue, 16 Feb 2021 15:22:38 -0500
Message-ID: <CAGvtKv4ezWCzu0+P2L4+GBKkuiNNpgUTebyV5x15AfVx1LW3Dw_at_mail.gmail.com>



Lok,
 Here are the numbers as relates to "usable" storage when it comes to X8-2 / X8M-2 Exadata

A 1/2 rack X8-2 will offer <this is 4 compute nodes + 7 storage cells> approximately

Data Capacity (Usable) – Normal Redundancy = 477 TB Data Capacity (Usable) – High Redundancy = 349 TB

What Grid Infrastructure version are you planning to use? You are correct that FLEX disk groups/redundancy was introduced with GI/ASM version 12.2.0.1 and higher.

Thanks,

--Rajesh

On Mon, Feb 15, 2021 at 10:45 AM Lok P <loknath.73_at_gmail.com> wrote:

> Thanks a lot Rajesh and Shane. It really gave valuable info about the X8
> configuration.
>
> If I sum up the data file(v$datafile) + tempfile(v$tempfile)+log
> file(v$log), it's coming as ~150TB now out of which ~45TB is showing as
> free in dba_free_space. And the break up of the ~150TB is coming as ,
> ~144TB is data file , 1.2TB is tempfile, ~.5TB of log file. So it seems
> this figure of ~150TB is the size of a single copy but not the sum of three
> copies which are maintained for the HIGH redundancy. And with HIGH
> redundancy we must be occupying ~150*3=~450TB of space from ASM or from
> total 7 storage cells. Please correct me if I am wrong.
>
> And as per the X8 configuration you mentioned, it looks like the new X8
> half RAC, we will get ~25.6*7= ~180TB of flash (against ~41TB in current
> X5) and ~168*7= ~1PB of hard disk(as opposed to ~614TB of hard disk in
> current X5). So considering for large reads, we are touching max flash IOPS
> of ~2million during peak load(opposed to ~1.3million IOPs limit in current
> X5) but here in new X-8, we will have 4.5time more overall flash, so hope
> things(both flash IOPS and Storage) will be lot better and well under the
> limit/capacity on new X8.
>
> We will try to push for making the ASM redundancy HIGH. But I was thinking
> with high Redundancy what will be the maximum size of the data we can hold
> with this X-8 half RAC. As in new X8 , we will be having ~1PB of raw
> storage which is almost twice of current ~614TB in X-5, so what will be
> the max usable storage available for our database if we go for HIGH
> redundancy VS NORMAL redundancy with our new X-8 machine?
>
> I was also checking regarding flex disk groups and seeing in a few docs
> the RDBMS is asked to be minimum on 12.2, but we are currently on 11.2.0.4
> , so perhaps that is not an option at this moment at least.
>
> Regards
> Lok
>
> On Mon, Feb 15, 2021 at 6:44 PM Rajesh Aialavajjala <
> r.aialavajjala_at_gmail.com> wrote:
>
>> Lok,
>> To try and answer your question about the percentage/share of flash vs
>> hard disk in the Exadata X8-2/X8M-2
>>
>> HC Storage Cells are packaged with
>>
>> 12x 14 TB 7,200 RPM disks = 168 TB Raw /
>> Hard Disk
>> 4x 6.4TB NVMe PCIe 3.0 Flashcard = 25.6 TB Raw / Flash
>>
>> So the ~166TB that Andy references is purely spinning drives/hard disks -
>> you have another ~25TB of Flash.
>>
>> If your organization is acquiring the X8M-2 hardware - you will add on
>> 1.5 TB of PMEM (Persistent Memory) to this.
>>
>> You may certainly add more storage cells to your environment - elastic
>> configurations in Exadata is a common thing - expansions or starting with
>> elastic configurations is often seen. You might want to consider expanding
>> storage after you evaluate your new X8 configuration.
>>
>> Your X5- 2 hardware has/had <the drive sizes did change mid-way through
>> the X5-2 generation - they started out with 12x4 TB drives and that was
>> doubled to the 8 TB drives>
>>
>> 4 PCI Flash cards each with 1.6 TB (raw) Exadata Smart Flash Cache
>> 12x 8 TB 7,200 RPM High Capacity disks
>>
>> So you are looking at quite an increase in raw storage capacity and Flash.
>>
>> +1 to Shane's mention about not using NORMAL redundancy in a Production
>> configuration - FLEX disk groups are permitted in Exadata but they are not
>> widely used to the best of my understanding - the Oracle best practices
>> recommend HIGH redundancy - in fact I do not think you can configure FLEX
>> redundancy within OEDA as it falls outside best practices - so you would
>> have to tear down/recreate the disk groups manually. Or customize the
>> initial install prior to moving your database(s)
>>
>> Thanks,
>>
>> --Rajesh
>>
>>
>>
>>
>> On Mon, Feb 15, 2021 at 8:09 AM Shane Borden <dmarc-noreply_at_freelists.org>
>> wrote:
>>
>>> I think it's a mistake to go with NORMAL redundancy in a production
>>> system. I could probably understand the argument for a test system, but
>>> not production. How do you think a regular storage array is configured?
>>> Likely not with a normal redundancy scheme. Aside from all of the other
>>> things mentioned already, you are now also bound to offline patching or if
>>> you try to do rolling you run a risk being able to tolerate only 1 disk
>>> failure or 1 storage server failure. Not only does this protect against
>>> some sort of technical failure, but also the human factor that could occur
>>> during patching
>>>
>>> If you must consider normal redundancy, I would go with FLEX Diskgroups
>>> vs configuring the entire rack as normal redundancy. That way, if you
>>> must, then you can specify the redundancy at the table space level rather
>>> than the disk group level. Should you change your mind later, its a simple
>>> alter command to change the redundancy rather than tearing down the entire
>>> rack and rebuilding it.
>>>
>>>
>>> ---
>>>
>>> Thanks,
>>>
>>>
>>> Shane Borden
>>> sborden76_at_yahoo.com
>>>
>>> On Feb 15, 2021, at 1:55 AM, Lok P <loknath.73_at_gmail.com> wrote:
>>>
>>> Thanks much Andy.
>>>
>>> Yes we have an existing machine that is X5-2 half Rac. (It's basically
>>> a full RAC machine logically splitted to two half RAC and we have only this
>>> database hosted on this half RAC). (And we are currently ~150TB and keep
>>> growing so we're planning for NORMAL redundancy.)
>>>
>>> In current X5 I am seeing its ~80TB hard disk+6TB flash disk per storage
>>> cell. When you said *"The Exadata X8 and X8M storage cells have 14TB
>>> disks. With 12 per cell, that's 168TB *per cell*." *Does it mean the
>>> sum of flash+hard disk is ~168TB per cell? What is the percentage/share of
>>> flash disk and hard disk in that?
>>>
>>> Apart from current storage saturation, with regards to the IOPS issue
>>> in our current X5 system, I am seeing in OEM the flash IOPS is reaching to
>>> ~2000K for large reads and the max limit is showing somewhere near
>>> ~1.3milion. Overall IO utilization for flash disk is showing ~75%. Hard
>>> disk IO limit shows as ~20K and most of the time it looks to be both small
>>> reads and large reads are staying below this limit. The overall IO
>>> utilization is staying below ~30% for the hard disks.
>>>
>>> Just got to know from the infra team the X8 to which we are planning to
>>> move into is not extreme flash rather high capacity disk only(similar to
>>> what we have on current X5) , but considering more flash storage and hard
>>> disk storage in each of the 7 - storage cells, we are expecting the new X8
>>> will satisfy the current capacity crunch both wrt space and IOPS.
>>>
>>> And as you just mentioned in your explanation that adding more storage
>>> cells will also help in bumping up the capacity. So should we consider
>>> adding a few more new storage cells on and above half rac to make it 8 or 9
>>> storage cells in total and if this is standard practice in the exadata
>>> world?
>>>
>>> Regards
>>> Lok
>>>
>>> On Sat, Feb 13, 2021 at 9:00 PM Andy Wattenhofer <watt0012_at_umn.edu>
>>> wrote:
>>>
>>>> The Exadata X8 and X8M storage cells have 14TB disks. With 12 per cell,
>>>> that's 168TB *per cell*. You haven't mentioned which rack size your X5
>>>> machine is, but from the numbers you're showing it looks like maybe a half
>>>> rack. A half rack of X8M will come with 1PB of total disk, giving you over
>>>> 300TB of usable space to divide between your RECO and DATA disk groups if
>>>> you are using HIGH redundancy. That seems plenty for your 150TB database.
>>>> But if you need more, add another storage cell.
>>>>
>>>> As for performance degradation from using HIGH redundancy, you need to
>>>> consider that the additional work of that extra write is being taken on by
>>>> the storage cells. By definition the redundant block copies must go to
>>>> separate cells. NORMAL redundancy writes to two cells and HIGH goes to
>>>> three. In aggregate, each write will be as fast as your slowest cell. So
>>>> any difference in write performance is more a function of the total number
>>>> of cells you have to share the workload. That difference would be
>>>> diminished as you increase the number of cells in the cluster.
>>>>
>>>> And of course that difference would be mitigated by the write back
>>>> cache too because writes to the flash cache are faster than writes to disk.
>>>>
>>>> Honestly, I can't imagine that Oracle would sell you an Exadata machine
>>>> where any of this would be a problem for you. It would be so undersized
>>>> from the beginning that your problems with it would be much greater than
>>>> any marginal difference in write performance from using high redundancy.
>>>>
>>>> Andy
>>>>
>>>>
>>>> On Fri, Feb 12, 2021 at 10:31 AM Lok P <loknath.73_at_gmail.com> wrote:
>>>>
>>>>> Thanks Much.
>>>>>
>>>>> I got some doc but missed. below URL also pointing to high redundancy
>>>>> as requirement but may be its not compulsory as you stated.
>>>>>
>>>>> We have the size of our database i.e. (~150TB), so we were thinking to
>>>>> have some space reduction using double mirror rather triple mirroring. but
>>>>> i was not aware that the disk size itself is a lot bigger in X8 and as you
>>>>> stated in X8 we have a bigger size disk and so the mirroring will take a
>>>>> lot of time(in case of crash/failure) and thus HIGH redundancy is
>>>>> recommended. I think we have to relook into the same. Note- What I see in
>>>>> the current X-5 machine, we have ~6TB flash/storage server and ~80TB hard
>>>>> disk/storage server. Not sure what that is in case of X8 though.
>>>>>
>>>>> And another doubt i had was, Is it also true that the IOPS will be
>>>>> degraded by some percentage in case of tripple mirror as compared to double
>>>>> mirror as because it has to write one more additional copy of data block
>>>>> to flash/disk?
>>>>>
>>>>>
>>>>> https://stefanpanek.wordpress.com/2017/10/20/exadata-flash-cache-enabled-for-write-back/
>>>>>
>>>>> On Fri, Feb 12, 2021 at 8:52 PM Ghassan Salem <salem.ghassan_at_gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Please, can you point to where you saw that write-back is only
>>>>>> possible with high-redundancy?
>>>>>> High redundancy is very much recommended with X8 due to the size of
>>>>>> the disks, and the time it takes to re-mirror in case of disk loss: if
>>>>>> you're in a normal redundancy, and you loose a disk, while re-mirroring is
>>>>>> being done, you don't have any second copy of that data, and so, if you
>>>>>> loose yet-another disk, you're in big trouble. With lower capacity disks,
>>>>>> the re-mirroring takes much less time, and so the risk is lower.
>>>>>>
>>>>>> regards
>>>>>>
>>>>>> On Fri, Feb 12, 2021 at 3:57 PM Lok P <loknath.73_at_gmail.com> wrote:
>>>>>>
>>>>>>> Basically i am seeing many doc stating triple mirroring is
>>>>>>> recommended with "write back flash cache" and some other stating write back
>>>>>>> flash cache is not possible without HIGH redundancy/triple mirroring. So
>>>>>>> there is a difference between these two statements because if we decide to
>>>>>>> go for NORMAL redundancy to save some space and to have some IO benefit(in
>>>>>>> terms of not writing one more additional data block copy). But we want to
>>>>>>> utilize the "write back flash cache" option to get benefits on write IOPS.
>>>>>>> And in this case if restriction is put in place for "High redundancy" we
>>>>>>> won't be able to do that. Please Correct me if my understanding is wrong?
>>>>>>>
>>>>>>> On Fri, Feb 12, 2021 at 12:53 PM Lok P <loknath.73_at_gmail.com> wrote:
>>>>>>>
>>>>>>>> Seeing in below doc which state its recommended to go for high
>>>>>>>> redundancy ASM disk group(i.e. triple mirroring) in case we are using write
>>>>>>>> back flash cache as because the data will be first written/stay in flash
>>>>>>>> cache and flushed to the disk later stage and in case of failure it has to
>>>>>>>> be recovered from mirror copy. But i am wondering , is this not possible
>>>>>>>> with double mirroring , will it not survive the data loss in case of
>>>>>>>> failure? Want to understand what is the suggested setup which will give
>>>>>>>> optimal space usage without compromising IOPS and data loss.
>>>>>>>>
>>>>>>>>
>>>>>>>> https://docs.oracle.com/en/engineered-systems/exadata-database-machine/sagug/exadata-storage-server-software-introduction.html#GUID-E10F7A58-2B07-472D-BF31-28D6D0201D53
>>>>>>>>
>>>>>>>> Regards
>>>>>>>> Lok
>>>>>>>>
>>>>>>>> On Fri, Feb 12, 2021 at 10:42 AM Lok P <loknath.73_at_gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hello Listers, We are moving from exadata X5 to X8 and there are
>>>>>>>>> multiple reasons behind it. Few of them are , 1)we are almost going to
>>>>>>>>> saturate the existing storage capacity(DB size reaching ~150TB) in current
>>>>>>>>> X5. 2)And the current IOPS on X5 is also reaching its max while the system
>>>>>>>>> works during its peak load.
>>>>>>>>>
>>>>>>>>> We are currently having HIGH redundancy(triple mirroring) for our
>>>>>>>>> existing X5 machines for DATA and RECO disk group and DBFS is kept as
>>>>>>>>> NORMAL redundancy(double mirroring). Now few folks raised questions on the
>>>>>>>>> impact on IOPS and storage space consumption, if we use double
>>>>>>>>> mirroring(NORMAl redundancy) vs triple mirroring(High redundancy) in the
>>>>>>>>> new X8 machine. I can see the benefit of double mirroring(Normal
>>>>>>>>> redundancy) being saved in storage space(around 1/3rd in terms of DATA and
>>>>>>>>> REDO copies), but then what is the risk wrt data loss, is it okay in a
>>>>>>>>> production system? (Note- We do use ZDLRA backup for taking the DB backup.
>>>>>>>>> And for disaster recovery we have active data guard physical standby in
>>>>>>>>> place which runs in read only mode).
>>>>>>>>>
>>>>>>>>> With regards to IOPS, we are going with default write back flash
>>>>>>>>> cache enabled here. Is it correct that with double mirroring we have to
>>>>>>>>> write/read into two places VS in triple mirroring we have to do that in
>>>>>>>>> three places , so there will also be degradation in IOPS with triple
>>>>>>>>> mirroring/High redundancy as compared to double mirroring? if it's true
>>>>>>>>> then by what percentage the IOPS degradation will be there? And then if
>>>>>>>>> it's okay if we go for double mirroring as that will benefit us wrt IOPS
>>>>>>>>> and also saves a good amount of storage space?
>>>>>>>>>
>>>>>>>>> Regards
>>>>>>>>>
>>>>>>>>> Lok
>>>>>>>>>
>>>>>>>>
>>>

--
http://www.freelists.org/webpage/oracle-l
Received on Tue Feb 16 2021 - 21:22:38 CET

Original text of this message