Re: Question on Exadata X8 - IO

From: Gabriel Hanauer <gabriel.hanauer_at_gmail.com>
Date: Wed, 17 Feb 2021 09:08:34 -0300
Message-ID: <CAOOrn=Gcd66Ktgg24n75kST8q_t6FTu3FriG+nU_Y2hYtXJBYQ_at_mail.gmail.com>



Hello Lok,

You could look at the X8M-2 Datasheet.

https://www.oracle.com/a/ocom/docs/engineered-systems/exadata/exadata-x8m-2-ds.pdf

There is a handful of information in there.

Regards,

On Wed, Feb 17, 2021 at 7:17 AM Lok P <loknath.73_at_gmail.com> wrote:

> Thank you so much Rajesh. It really helps.
>
> My thought was that ~168TB Raw storage per server means 168/2= ~84TB
> usable storage per cell server with Normal redundancy and 168/3=~56TB
> usable storage per cell server with High redundancy. But it seems it's not
> that simple and some additional amount of space is really going out of the
> way.
>
> In current X5-2 high capacity Half Rac , it looks like we are capped
> around ~1 to 1.5million IOPS for flash and ~18K IOPS for hard disk. What is
> the flash and hard disk IOPS limit for X8 High capacity half rac ?
>
> And i confirmed with the infra team, we are planning to use the
> 19.0.0.0.0 ASM grid in new X8 keeping the Oracle database version at
> 11.2.0.4 only , so it means we are eligible for flex disk groups.
>
> Regards
> Lok
>
> On Wed, Feb 17, 2021 at 1:52 AM Rajesh Aialavajjala <
> r.aialavajjala_at_gmail.com> wrote:
>
>> Lok,
>> Here are the numbers as relates to "usable" storage when it comes to
>> X8-2 / X8M-2 Exadata
>>
>> A 1/2 rack X8-2 will offer <this is 4 compute nodes + 7 storage cells>
>> approximately
>>
>> Data Capacity (Usable) – Normal Redundancy = 477 TB
>> Data Capacity (Usable) – High Redundancy = 349 TB
>>
>> What Grid Infrastructure version are you planning to use? You are correct
>> that FLEX disk groups/redundancy was introduced with GI/ASM version
>> 12.2.0.1 and higher.
>>
>> Thanks,
>>
>> --Rajesh
>>
>>
>> On Mon, Feb 15, 2021 at 10:45 AM Lok P <loknath.73_at_gmail.com> wrote:
>>
>>> Thanks a lot Rajesh and Shane. It really gave valuable info about the X8
>>> configuration.
>>>
>>> If I sum up the data file(v$datafile) + tempfile(v$tempfile)+log
>>> file(v$log), it's coming as ~150TB now out of which ~45TB is showing as
>>> free in dba_free_space. And the break up of the ~150TB is coming as ,
>>> ~144TB is data file , 1.2TB is tempfile, ~.5TB of log file. So it seems
>>> this figure of ~150TB is the size of a single copy but not the sum of three
>>> copies which are maintained for the HIGH redundancy. And with HIGH
>>> redundancy we must be occupying ~150*3=~450TB of space from ASM or from
>>> total 7 storage cells. Please correct me if I am wrong.
>>>
>>> And as per the X8 configuration you mentioned, it looks like the new X8
>>> half RAC, we will get ~25.6*7= ~180TB of flash (against ~41TB in current
>>> X5) and ~168*7= ~1PB of hard disk(as opposed to ~614TB of hard disk in
>>> current X5). So considering for large reads, we are touching max flash IOPS
>>> of ~2million during peak load(opposed to ~1.3million IOPs limit in current
>>> X5) but here in new X-8, we will have 4.5time more overall flash, so hope
>>> things(both flash IOPS and Storage) will be lot better and well under the
>>> limit/capacity on new X8.
>>>
>>> We will try to push for making the ASM redundancy HIGH. But I was
>>> thinking with high Redundancy what will be the maximum size of the data we
>>> can hold with this X-8 half RAC. As in new X8 , we will be having ~1PB of
>>> raw storage which is almost twice of current ~614TB in X-5, so what will
>>> be the max usable storage available for our database if we go for HIGH
>>> redundancy VS NORMAL redundancy with our new X-8 machine?
>>>
>>> I was also checking regarding flex disk groups and seeing in a few docs
>>> the RDBMS is asked to be minimum on 12.2, but we are currently on 11.2.0.4
>>> , so perhaps that is not an option at this moment at least.
>>>
>>> Regards
>>> Lok
>>>
>>> On Mon, Feb 15, 2021 at 6:44 PM Rajesh Aialavajjala <
>>> r.aialavajjala_at_gmail.com> wrote:
>>>
>>>> Lok,
>>>> To try and answer your question about the percentage/share of flash vs
>>>> hard disk in the Exadata X8-2/X8M-2
>>>>
>>>> HC Storage Cells are packaged with
>>>>
>>>> 12x 14 TB 7,200 RPM disks = 168 TB Raw
>>>> / Hard Disk
>>>> 4x 6.4TB NVMe PCIe 3.0 Flashcard = 25.6 TB Raw / Flash
>>>>
>>>> So the ~166TB that Andy references is purely spinning drives/hard disks
>>>> - you have another ~25TB of Flash.
>>>>
>>>> If your organization is acquiring the X8M-2 hardware - you will add on
>>>> 1.5 TB of PMEM (Persistent Memory) to this.
>>>>
>>>> You may certainly add more storage cells to your environment - elastic
>>>> configurations in Exadata is a common thing - expansions or starting with
>>>> elastic configurations is often seen. You might want to consider expanding
>>>> storage after you evaluate your new X8 configuration.
>>>>
>>>> Your X5- 2 hardware has/had <the drive sizes did change mid-way through
>>>> the X5-2 generation - they started out with 12x4 TB drives and that was
>>>> doubled to the 8 TB drives>
>>>>
>>>> 4 PCI Flash cards each with 1.6 TB (raw) Exadata Smart Flash Cache
>>>> 12x 8 TB 7,200 RPM High Capacity disks
>>>>
>>>> So you are looking at quite an increase in raw storage capacity and
>>>> Flash.
>>>>
>>>> +1 to Shane's mention about not using NORMAL redundancy in a Production
>>>> configuration - FLEX disk groups are permitted in Exadata but they are not
>>>> widely used to the best of my understanding - the Oracle best practices
>>>> recommend HIGH redundancy - in fact I do not think you can configure FLEX
>>>> redundancy within OEDA as it falls outside best practices - so you would
>>>> have to tear down/recreate the disk groups manually. Or customize the
>>>> initial install prior to moving your database(s)
>>>>
>>>> Thanks,
>>>>
>>>> --Rajesh
>>>>
>>>>
>>>>
>>>>
>>>> On Mon, Feb 15, 2021 at 8:09 AM Shane Borden <
>>>> dmarc-noreply_at_freelists.org> wrote:
>>>>
>>>>> I think it's a mistake to go with NORMAL redundancy in a production
>>>>> system. I could probably understand the argument for a test system, but
>>>>> not production. How do you think a regular storage array is configured?
>>>>> Likely not with a normal redundancy scheme. Aside from all of the other
>>>>> things mentioned already, you are now also bound to offline patching or if
>>>>> you try to do rolling you run a risk being able to tolerate only 1 disk
>>>>> failure or 1 storage server failure. Not only does this protect against
>>>>> some sort of technical failure, but also the human factor that could occur
>>>>> during patching
>>>>>
>>>>> If you must consider normal redundancy, I would go with FLEX
>>>>> Diskgroups vs configuring the entire rack as normal redundancy. That way,
>>>>> if you must, then you can specify the redundancy at the table space level
>>>>> rather than the disk group level. Should you change your mind later, its a
>>>>> simple alter command to change the redundancy rather than tearing down the
>>>>> entire rack and rebuilding it.
>>>>>
>>>>>
>>>>> ---
>>>>>
>>>>> Thanks,
>>>>>
>>>>>
>>>>> Shane Borden
>>>>> sborden76_at_yahoo.com
>>>>>
>>>>> On Feb 15, 2021, at 1:55 AM, Lok P <loknath.73_at_gmail.com> wrote:
>>>>>
>>>>> Thanks much Andy.
>>>>>
>>>>> Yes we have an existing machine that is X5-2 half Rac. (It's
>>>>> basically a full RAC machine logically splitted to two half RAC and we have
>>>>> only this database hosted on this half RAC). (And we are currently ~150TB
>>>>> and keep growing so we're planning for NORMAL redundancy.)
>>>>>
>>>>> In current X5 I am seeing its ~80TB hard disk+6TB flash disk per
>>>>> storage cell. When you said *"The Exadata X8 and X8M storage cells
>>>>> have 14TB disks. With 12 per cell, that's 168TB *per cell*." *Does
>>>>> it mean the sum of flash+hard disk is ~168TB per cell? What is the
>>>>> percentage/share of flash disk and hard disk in that?
>>>>>
>>>>> Apart from current storage saturation, with regards to the IOPS issue
>>>>> in our current X5 system, I am seeing in OEM the flash IOPS is reaching to
>>>>> ~2000K for large reads and the max limit is showing somewhere near
>>>>> ~1.3milion. Overall IO utilization for flash disk is showing ~75%. Hard
>>>>> disk IO limit shows as ~20K and most of the time it looks to be both small
>>>>> reads and large reads are staying below this limit. The overall IO
>>>>> utilization is staying below ~30% for the hard disks.
>>>>>
>>>>> Just got to know from the infra team the X8 to which we are planning
>>>>> to move into is not extreme flash rather high capacity disk only(similar to
>>>>> what we have on current X5) , but considering more flash storage and hard
>>>>> disk storage in each of the 7 - storage cells, we are expecting the new X8
>>>>> will satisfy the current capacity crunch both wrt space and IOPS.
>>>>>
>>>>> And as you just mentioned in your explanation that adding more storage
>>>>> cells will also help in bumping up the capacity. So should we consider
>>>>> adding a few more new storage cells on and above half rac to make it 8 or 9
>>>>> storage cells in total and if this is standard practice in the exadata
>>>>> world?
>>>>>
>>>>> Regards
>>>>> Lok
>>>>>
>>>>> On Sat, Feb 13, 2021 at 9:00 PM Andy Wattenhofer <watt0012_at_umn.edu>
>>>>> wrote:
>>>>>
>>>>>> The Exadata X8 and X8M storage cells have 14TB disks. With 12 per
>>>>>> cell, that's 168TB *per cell*. You haven't mentioned which rack size your
>>>>>> X5 machine is, but from the numbers you're showing it looks like maybe a
>>>>>> half rack. A half rack of X8M will come with 1PB of total disk, giving you
>>>>>> over 300TB of usable space to divide between your RECO and DATA disk groups
>>>>>> if you are using HIGH redundancy. That seems plenty for your 150TB
>>>>>> database. But if you need more, add another storage cell.
>>>>>>
>>>>>> As for performance degradation from using HIGH redundancy, you need
>>>>>> to consider that the additional work of that extra write is being taken on
>>>>>> by the storage cells. By definition the redundant block copies must go to
>>>>>> separate cells. NORMAL redundancy writes to two cells and HIGH goes to
>>>>>> three. In aggregate, each write will be as fast as your slowest cell. So
>>>>>> any difference in write performance is more a function of the total number
>>>>>> of cells you have to share the workload. That difference would be
>>>>>> diminished as you increase the number of cells in the cluster.
>>>>>>
>>>>>> And of course that difference would be mitigated by the write back
>>>>>> cache too because writes to the flash cache are faster than writes to disk.
>>>>>>
>>>>>> Honestly, I can't imagine that Oracle would sell you an Exadata
>>>>>> machine where any of this would be a problem for you. It would be so
>>>>>> undersized from the beginning that your problems with it would be much
>>>>>> greater than any marginal difference in write performance from using high
>>>>>> redundancy.
>>>>>>
>>>>>> Andy
>>>>>>
>>>>>>
>>>>>> On Fri, Feb 12, 2021 at 10:31 AM Lok P <loknath.73_at_gmail.com> wrote:
>>>>>>
>>>>>>> Thanks Much.
>>>>>>>
>>>>>>> I got some doc but missed. below URL also pointing to high
>>>>>>> redundancy as requirement but may be its not compulsory as you stated.
>>>>>>>
>>>>>>> We have the size of our database i.e. (~150TB), so we were thinking
>>>>>>> to have some space reduction using double mirror rather triple mirroring.
>>>>>>> but i was not aware that the disk size itself is a lot bigger in X8 and as
>>>>>>> you stated in X8 we have a bigger size disk and so the mirroring will take
>>>>>>> a lot of time(in case of crash/failure) and thus HIGH redundancy is
>>>>>>> recommended. I think we have to relook into the same. Note- What I see in
>>>>>>> the current X-5 machine, we have ~6TB flash/storage server and ~80TB hard
>>>>>>> disk/storage server. Not sure what that is in case of X8 though.
>>>>>>>
>>>>>>> And another doubt i had was, Is it also true that the IOPS will be
>>>>>>> degraded by some percentage in case of tripple mirror as compared to double
>>>>>>> mirror as because it has to write one more additional copy of data block
>>>>>>> to flash/disk?
>>>>>>>
>>>>>>>
>>>>>>> https://stefanpanek.wordpress.com/2017/10/20/exadata-flash-cache-enabled-for-write-back/
>>>>>>>
>>>>>>> On Fri, Feb 12, 2021 at 8:52 PM Ghassan Salem <
>>>>>>> salem.ghassan_at_gmail.com> wrote:
>>>>>>>
>>>>>>>> Please, can you point to where you saw that write-back is only
>>>>>>>> possible with high-redundancy?
>>>>>>>> High redundancy is very much recommended with X8 due to the size of
>>>>>>>> the disks, and the time it takes to re-mirror in case of disk loss: if
>>>>>>>> you're in a normal redundancy, and you loose a disk, while re-mirroring is
>>>>>>>> being done, you don't have any second copy of that data, and so, if you
>>>>>>>> loose yet-another disk, you're in big trouble. With lower capacity disks,
>>>>>>>> the re-mirroring takes much less time, and so the risk is lower.
>>>>>>>>
>>>>>>>> regards
>>>>>>>>
>>>>>>>> On Fri, Feb 12, 2021 at 3:57 PM Lok P <loknath.73_at_gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Basically i am seeing many doc stating triple mirroring is
>>>>>>>>> recommended with "write back flash cache" and some other stating write back
>>>>>>>>> flash cache is not possible without HIGH redundancy/triple mirroring. So
>>>>>>>>> there is a difference between these two statements because if we decide to
>>>>>>>>> go for NORMAL redundancy to save some space and to have some IO benefit(in
>>>>>>>>> terms of not writing one more additional data block copy). But we want to
>>>>>>>>> utilize the "write back flash cache" option to get benefits on write IOPS.
>>>>>>>>> And in this case if restriction is put in place for "High redundancy" we
>>>>>>>>> won't be able to do that. Please Correct me if my understanding is wrong?
>>>>>>>>>
>>>>>>>>> On Fri, Feb 12, 2021 at 12:53 PM Lok P <loknath.73_at_gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Seeing in below doc which state its recommended to go for high
>>>>>>>>>> redundancy ASM disk group(i.e. triple mirroring) in case we are using write
>>>>>>>>>> back flash cache as because the data will be first written/stay in flash
>>>>>>>>>> cache and flushed to the disk later stage and in case of failure it has to
>>>>>>>>>> be recovered from mirror copy. But i am wondering , is this not possible
>>>>>>>>>> with double mirroring , will it not survive the data loss in case of
>>>>>>>>>> failure? Want to understand what is the suggested setup which will give
>>>>>>>>>> optimal space usage without compromising IOPS and data loss.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> https://docs.oracle.com/en/engineered-systems/exadata-database-machine/sagug/exadata-storage-server-software-introduction.html#GUID-E10F7A58-2B07-472D-BF31-28D6D0201D53
>>>>>>>>>>
>>>>>>>>>> Regards
>>>>>>>>>> Lok
>>>>>>>>>>
>>>>>>>>>> On Fri, Feb 12, 2021 at 10:42 AM Lok P <loknath.73_at_gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hello Listers, We are moving from exadata X5 to X8 and there are
>>>>>>>>>>> multiple reasons behind it. Few of them are , 1)we are almost going to
>>>>>>>>>>> saturate the existing storage capacity(DB size reaching ~150TB) in current
>>>>>>>>>>> X5. 2)And the current IOPS on X5 is also reaching its max while the system
>>>>>>>>>>> works during its peak load.
>>>>>>>>>>>
>>>>>>>>>>> We are currently having HIGH redundancy(triple mirroring) for
>>>>>>>>>>> our existing X5 machines for DATA and RECO disk group and DBFS is kept as
>>>>>>>>>>> NORMAL redundancy(double mirroring). Now few folks raised questions on the
>>>>>>>>>>> impact on IOPS and storage space consumption, if we use double
>>>>>>>>>>> mirroring(NORMAl redundancy) vs triple mirroring(High redundancy) in the
>>>>>>>>>>> new X8 machine. I can see the benefit of double mirroring(Normal
>>>>>>>>>>> redundancy) being saved in storage space(around 1/3rd in terms of DATA and
>>>>>>>>>>> REDO copies), but then what is the risk wrt data loss, is it okay in a
>>>>>>>>>>> production system? (Note- We do use ZDLRA backup for taking the DB backup.
>>>>>>>>>>> And for disaster recovery we have active data guard physical standby in
>>>>>>>>>>> place which runs in read only mode).
>>>>>>>>>>>
>>>>>>>>>>> With regards to IOPS, we are going with default write back flash
>>>>>>>>>>> cache enabled here. Is it correct that with double mirroring we have to
>>>>>>>>>>> write/read into two places VS in triple mirroring we have to do that in
>>>>>>>>>>> three places , so there will also be degradation in IOPS with triple
>>>>>>>>>>> mirroring/High redundancy as compared to double mirroring? if it's true
>>>>>>>>>>> then by what percentage the IOPS degradation will be there? And then if
>>>>>>>>>>> it's okay if we go for double mirroring as that will benefit us wrt IOPS
>>>>>>>>>>> and also saves a good amount of storage space?
>>>>>>>>>>>
>>>>>>>>>>> Regards
>>>>>>>>>>>
>>>>>>>>>>> Lok
>>>>>>>>>>>
>>>>>>>>>>
>>>>>

-- 
Gabriel Hanauer


--
http://www.freelists.org/webpage/oracle-l
Received on Wed Feb 17 2021 - 13:08:34 CET

Original text of this message