Re: Question on Exadata X8 - IO

From: Lok P <loknath.73_at_gmail.com>
Date: Wed, 17 Feb 2021 23:01:35 +0530
Message-ID: <CAKna9VYWXmdDXghw750sm_fV+Yp8bnCPL7uTvoLygW9=W556pw_at_mail.gmail.com>



Thank you Gabriel. I was trying to compare the current X5-2 VS the new X8 figures side by side. I got most of these for X8M-2from your suggested doc but am not able to get a few IOPS figures for X5-2 as below. Would you suggest some document which has this information available?

IOPS For Half Rack
  Flash Read IOPS Flash Write IOPS Disk Read IOPS Disk Write IOPS X5-2 High capacity
X8M-2 High capacity 6million 3.2million
Storage capacity for half rack
  Raw Disk Storage Usable Disk Storage with High Redundancy Usable Disk Storage with Normal Redundancy Flash Storage Number of Cores X5-2 High capacity 672TB 210TB 280TB 42TB X8M-2 High capacity 1176TB 349TB 477TB 179TB 224

Regards
Lok

On Wed, Feb 17, 2021 at 5:38 PM Gabriel Hanauer <gabriel.hanauer_at_gmail.com> wrote:

> Hello Lok,
>
> You could look at the X8M-2 Datasheet.
>
>
> https://www.oracle.com/a/ocom/docs/engineered-systems/exadata/exadata-x8m-2-ds.pdf
>
> There is a handful of information in there.
>
> Regards,
>
>
>
> On Wed, Feb 17, 2021 at 7:17 AM Lok P <loknath.73_at_gmail.com> wrote:
>
>> Thank you so much Rajesh. It really helps.
>>
>> My thought was that ~168TB Raw storage per server means 168/2= ~84TB
>> usable storage per cell server with Normal redundancy and 168/3=~56TB
>> usable storage per cell server with High redundancy. But it seems it's not
>> that simple and some additional amount of space is really going out of the
>> way.
>>
>> In current X5-2 high capacity Half Rac , it looks like we are capped
>> around ~1 to 1.5million IOPS for flash and ~18K IOPS for hard disk. What is
>> the flash and hard disk IOPS limit for X8 High capacity half rac ?
>>
>> And i confirmed with the infra team, we are planning to use the
>> 19.0.0.0.0 ASM grid in new X8 keeping the Oracle database version at
>> 11.2.0.4 only , so it means we are eligible for flex disk groups.
>>
>> Regards
>> Lok
>>
>> On Wed, Feb 17, 2021 at 1:52 AM Rajesh Aialavajjala <
>> r.aialavajjala_at_gmail.com> wrote:
>>
>>> Lok,
>>> Here are the numbers as relates to "usable" storage when it comes to
>>> X8-2 / X8M-2 Exadata
>>>
>>> A 1/2 rack X8-2 will offer <this is 4 compute nodes + 7 storage cells>
>>> approximately
>>>
>>> Data Capacity (Usable) – Normal Redundancy = 477 TB
>>> Data Capacity (Usable) – High Redundancy = 349 TB
>>>
>>> What Grid Infrastructure version are you planning to use? You are
>>> correct that FLEX disk groups/redundancy was introduced with GI/ASM version
>>> 12.2.0.1 and higher.
>>>
>>> Thanks,
>>>
>>> --Rajesh
>>>
>>>
>>> On Mon, Feb 15, 2021 at 10:45 AM Lok P <loknath.73_at_gmail.com> wrote:
>>>
>>>> Thanks a lot Rajesh and Shane. It really gave valuable info about the
>>>> X8 configuration.
>>>>
>>>> If I sum up the data file(v$datafile) + tempfile(v$tempfile)+log
>>>> file(v$log), it's coming as ~150TB now out of which ~45TB is showing as
>>>> free in dba_free_space. And the break up of the ~150TB is coming as ,
>>>> ~144TB is data file , 1.2TB is tempfile, ~.5TB of log file. So it seems
>>>> this figure of ~150TB is the size of a single copy but not the sum of three
>>>> copies which are maintained for the HIGH redundancy. And with HIGH
>>>> redundancy we must be occupying ~150*3=~450TB of space from ASM or from
>>>> total 7 storage cells. Please correct me if I am wrong.
>>>>
>>>> And as per the X8 configuration you mentioned, it looks like the new X8
>>>> half RAC, we will get ~25.6*7= ~180TB of flash (against ~41TB in current
>>>> X5) and ~168*7= ~1PB of hard disk(as opposed to ~614TB of hard disk in
>>>> current X5). So considering for large reads, we are touching max flash IOPS
>>>> of ~2million during peak load(opposed to ~1.3million IOPs limit in current
>>>> X5) but here in new X-8, we will have 4.5time more overall flash, so hope
>>>> things(both flash IOPS and Storage) will be lot better and well under the
>>>> limit/capacity on new X8.
>>>>
>>>> We will try to push for making the ASM redundancy HIGH. But I was
>>>> thinking with high Redundancy what will be the maximum size of the data we
>>>> can hold with this X-8 half RAC. As in new X8 , we will be having ~1PB of
>>>> raw storage which is almost twice of current ~614TB in X-5, so what will
>>>> be the max usable storage available for our database if we go for HIGH
>>>> redundancy VS NORMAL redundancy with our new X-8 machine?
>>>>
>>>> I was also checking regarding flex disk groups and seeing in a few docs
>>>> the RDBMS is asked to be minimum on 12.2, but we are currently on 11.2.0.4
>>>> , so perhaps that is not an option at this moment at least.
>>>>
>>>> Regards
>>>> Lok
>>>>
>>>> On Mon, Feb 15, 2021 at 6:44 PM Rajesh Aialavajjala <
>>>> r.aialavajjala_at_gmail.com> wrote:
>>>>
>>>>> Lok,
>>>>> To try and answer your question about the percentage/share of flash vs
>>>>> hard disk in the Exadata X8-2/X8M-2
>>>>>
>>>>> HC Storage Cells are packaged with
>>>>>
>>>>> 12x 14 TB 7,200 RPM disks = 168 TB Raw
>>>>> / Hard Disk
>>>>> 4x 6.4TB NVMe PCIe 3.0 Flashcard = 25.6 TB Raw / Flash
>>>>>
>>>>> So the ~166TB that Andy references is purely spinning drives/hard
>>>>> disks - you have another ~25TB of Flash.
>>>>>
>>>>> If your organization is acquiring the X8M-2 hardware - you will add on
>>>>> 1.5 TB of PMEM (Persistent Memory) to this.
>>>>>
>>>>> You may certainly add more storage cells to your environment - elastic
>>>>> configurations in Exadata is a common thing - expansions or starting with
>>>>> elastic configurations is often seen. You might want to consider expanding
>>>>> storage after you evaluate your new X8 configuration.
>>>>>
>>>>> Your X5- 2 hardware has/had <the drive sizes did change mid-way
>>>>> through the X5-2 generation - they started out with 12x4 TB drives and that
>>>>> was doubled to the 8 TB drives>
>>>>>
>>>>> 4 PCI Flash cards each with 1.6 TB (raw) Exadata Smart Flash Cache
>>>>> 12x 8 TB 7,200 RPM High Capacity disks
>>>>>
>>>>> So you are looking at quite an increase in raw storage capacity and
>>>>> Flash.
>>>>>
>>>>> +1 to Shane's mention about not using NORMAL redundancy in a
>>>>> Production configuration - FLEX disk groups are permitted in Exadata but
>>>>> they are not widely used to the best of my understanding - the Oracle best
>>>>> practices recommend HIGH redundancy - in fact I do not think you can
>>>>> configure FLEX redundancy within OEDA as it falls outside best practices -
>>>>> so you would have to tear down/recreate the disk groups manually. Or
>>>>> customize the initial install prior to moving your database(s)
>>>>>
>>>>> Thanks,
>>>>>
>>>>> --Rajesh
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Mon, Feb 15, 2021 at 8:09 AM Shane Borden <
>>>>> dmarc-noreply_at_freelists.org> wrote:
>>>>>
>>>>>> I think it's a mistake to go with NORMAL redundancy in a production
>>>>>> system. I could probably understand the argument for a test system, but
>>>>>> not production. How do you think a regular storage array is configured?
>>>>>> Likely not with a normal redundancy scheme. Aside from all of the other
>>>>>> things mentioned already, you are now also bound to offline patching or if
>>>>>> you try to do rolling you run a risk being able to tolerate only 1 disk
>>>>>> failure or 1 storage server failure. Not only does this protect against
>>>>>> some sort of technical failure, but also the human factor that could occur
>>>>>> during patching
>>>>>>
>>>>>> If you must consider normal redundancy, I would go with FLEX
>>>>>> Diskgroups vs configuring the entire rack as normal redundancy. That way,
>>>>>> if you must, then you can specify the redundancy at the table space level
>>>>>> rather than the disk group level. Should you change your mind later, its a
>>>>>> simple alter command to change the redundancy rather than tearing down the
>>>>>> entire rack and rebuilding it.
>>>>>>
>>>>>>
>>>>>> ---
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>>
>>>>>> Shane Borden
>>>>>> sborden76_at_yahoo.com
>>>>>>
>>>>>> On Feb 15, 2021, at 1:55 AM, Lok P <loknath.73_at_gmail.com> wrote:
>>>>>>
>>>>>> Thanks much Andy.
>>>>>>
>>>>>> Yes we have an existing machine that is X5-2 half Rac. (It's
>>>>>> basically a full RAC machine logically splitted to two half RAC and we have
>>>>>> only this database hosted on this half RAC). (And we are currently ~150TB
>>>>>> and keep growing so we're planning for NORMAL redundancy.)
>>>>>>
>>>>>> In current X5 I am seeing its ~80TB hard disk+6TB flash disk per
>>>>>> storage cell. When you said *"The Exadata X8 and X8M storage cells
>>>>>> have 14TB disks. With 12 per cell, that's 168TB *per cell*." *Does
>>>>>> it mean the sum of flash+hard disk is ~168TB per cell? What is the
>>>>>> percentage/share of flash disk and hard disk in that?
>>>>>>
>>>>>> Apart from current storage saturation, with regards to the IOPS
>>>>>> issue in our current X5 system, I am seeing in OEM the flash IOPS is
>>>>>> reaching to ~2000K for large reads and the max limit is showing somewhere
>>>>>> near ~1.3milion. Overall IO utilization for flash disk is showing ~75%.
>>>>>> Hard disk IO limit shows as ~20K and most of the time it looks to be both
>>>>>> small reads and large reads are staying below this limit. The overall IO
>>>>>> utilization is staying below ~30% for the hard disks.
>>>>>>
>>>>>> Just got to know from the infra team the X8 to which we are planning
>>>>>> to move into is not extreme flash rather high capacity disk only(similar to
>>>>>> what we have on current X5) , but considering more flash storage and hard
>>>>>> disk storage in each of the 7 - storage cells, we are expecting the new X8
>>>>>> will satisfy the current capacity crunch both wrt space and IOPS.
>>>>>>
>>>>>> And as you just mentioned in your explanation that adding more
>>>>>> storage cells will also help in bumping up the capacity. So should we
>>>>>> consider adding a few more new storage cells on and above half rac to make
>>>>>> it 8 or 9 storage cells in total and if this is standard practice in the
>>>>>> exadata world?
>>>>>>
>>>>>> Regards
>>>>>> Lok
>>>>>>
>>>>>> On Sat, Feb 13, 2021 at 9:00 PM Andy Wattenhofer <watt0012_at_umn.edu>
>>>>>> wrote:
>>>>>>
>>>>>>> The Exadata X8 and X8M storage cells have 14TB disks. With 12 per
>>>>>>> cell, that's 168TB *per cell*. You haven't mentioned which rack size your
>>>>>>> X5 machine is, but from the numbers you're showing it looks like maybe a
>>>>>>> half rack. A half rack of X8M will come with 1PB of total disk, giving you
>>>>>>> over 300TB of usable space to divide between your RECO and DATA disk groups
>>>>>>> if you are using HIGH redundancy. That seems plenty for your 150TB
>>>>>>> database. But if you need more, add another storage cell.
>>>>>>>
>>>>>>> As for performance degradation from using HIGH redundancy, you need
>>>>>>> to consider that the additional work of that extra write is being taken on
>>>>>>> by the storage cells. By definition the redundant block copies must go to
>>>>>>> separate cells. NORMAL redundancy writes to two cells and HIGH goes to
>>>>>>> three. In aggregate, each write will be as fast as your slowest cell. So
>>>>>>> any difference in write performance is more a function of the total number
>>>>>>> of cells you have to share the workload. That difference would be
>>>>>>> diminished as you increase the number of cells in the cluster.
>>>>>>>
>>>>>>> And of course that difference would be mitigated by the write back
>>>>>>> cache too because writes to the flash cache are faster than writes to disk.
>>>>>>>
>>>>>>> Honestly, I can't imagine that Oracle would sell you an Exadata
>>>>>>> machine where any of this would be a problem for you. It would be so
>>>>>>> undersized from the beginning that your problems with it would be much
>>>>>>> greater than any marginal difference in write performance from using high
>>>>>>> redundancy.
>>>>>>>
>>>>>>> Andy
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Feb 12, 2021 at 10:31 AM Lok P <loknath.73_at_gmail.com> wrote:
>>>>>>>
>>>>>>>> Thanks Much.
>>>>>>>>
>>>>>>>> I got some doc but missed. below URL also pointing to high
>>>>>>>> redundancy as requirement but may be its not compulsory as you stated.
>>>>>>>>
>>>>>>>> We have the size of our database i.e. (~150TB), so we were thinking
>>>>>>>> to have some space reduction using double mirror rather triple mirroring.
>>>>>>>> but i was not aware that the disk size itself is a lot bigger in X8 and as
>>>>>>>> you stated in X8 we have a bigger size disk and so the mirroring will take
>>>>>>>> a lot of time(in case of crash/failure) and thus HIGH redundancy is
>>>>>>>> recommended. I think we have to relook into the same. Note- What I see in
>>>>>>>> the current X-5 machine, we have ~6TB flash/storage server and ~80TB hard
>>>>>>>> disk/storage server. Not sure what that is in case of X8 though.
>>>>>>>>
>>>>>>>> And another doubt i had was, Is it also true that the IOPS will be
>>>>>>>> degraded by some percentage in case of tripple mirror as compared to double
>>>>>>>> mirror as because it has to write one more additional copy of data block
>>>>>>>> to flash/disk?
>>>>>>>>
>>>>>>>>
>>>>>>>> https://stefanpanek.wordpress.com/2017/10/20/exadata-flash-cache-enabled-for-write-back/
>>>>>>>>
>>>>>>>> On Fri, Feb 12, 2021 at 8:52 PM Ghassan Salem <
>>>>>>>> salem.ghassan_at_gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Please, can you point to where you saw that write-back is only
>>>>>>>>> possible with high-redundancy?
>>>>>>>>> High redundancy is very much recommended with X8 due to the size
>>>>>>>>> of the disks, and the time it takes to re-mirror in case of disk loss: if
>>>>>>>>> you're in a normal redundancy, and you loose a disk, while re-mirroring is
>>>>>>>>> being done, you don't have any second copy of that data, and so, if you
>>>>>>>>> loose yet-another disk, you're in big trouble. With lower capacity disks,
>>>>>>>>> the re-mirroring takes much less time, and so the risk is lower.
>>>>>>>>>
>>>>>>>>> regards
>>>>>>>>>
>>>>>>>>> On Fri, Feb 12, 2021 at 3:57 PM Lok P <loknath.73_at_gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Basically i am seeing many doc stating triple mirroring is
>>>>>>>>>> recommended with "write back flash cache" and some other stating write back
>>>>>>>>>> flash cache is not possible without HIGH redundancy/triple mirroring. So
>>>>>>>>>> there is a difference between these two statements because if we decide to
>>>>>>>>>> go for NORMAL redundancy to save some space and to have some IO benefit(in
>>>>>>>>>> terms of not writing one more additional data block copy). But we want to
>>>>>>>>>> utilize the "write back flash cache" option to get benefits on write IOPS.
>>>>>>>>>> And in this case if restriction is put in place for "High redundancy" we
>>>>>>>>>> won't be able to do that. Please Correct me if my understanding is wrong?
>>>>>>>>>>
>>>>>>>>>> On Fri, Feb 12, 2021 at 12:53 PM Lok P <loknath.73_at_gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Seeing in below doc which state its recommended to go for high
>>>>>>>>>>> redundancy ASM disk group(i.e. triple mirroring) in case we are using write
>>>>>>>>>>> back flash cache as because the data will be first written/stay in flash
>>>>>>>>>>> cache and flushed to the disk later stage and in case of failure it has to
>>>>>>>>>>> be recovered from mirror copy. But i am wondering , is this not possible
>>>>>>>>>>> with double mirroring , will it not survive the data loss in case of
>>>>>>>>>>> failure? Want to understand what is the suggested setup which will give
>>>>>>>>>>> optimal space usage without compromising IOPS and data loss.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> https://docs.oracle.com/en/engineered-systems/exadata-database-machine/sagug/exadata-storage-server-software-introduction.html#GUID-E10F7A58-2B07-472D-BF31-28D6D0201D53
>>>>>>>>>>>
>>>>>>>>>>> Regards
>>>>>>>>>>> Lok
>>>>>>>>>>>
>>>>>>>>>>> On Fri, Feb 12, 2021 at 10:42 AM Lok P <loknath.73_at_gmail.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hello Listers, We are moving from exadata X5 to X8 and there
>>>>>>>>>>>> are multiple reasons behind it. Few of them are , 1)we are almost going to
>>>>>>>>>>>> saturate the existing storage capacity(DB size reaching ~150TB) in current
>>>>>>>>>>>> X5. 2)And the current IOPS on X5 is also reaching its max while the system
>>>>>>>>>>>> works during its peak load.
>>>>>>>>>>>>
>>>>>>>>>>>> We are currently having HIGH redundancy(triple mirroring) for
>>>>>>>>>>>> our existing X5 machines for DATA and RECO disk group and DBFS is kept as
>>>>>>>>>>>> NORMAL redundancy(double mirroring). Now few folks raised questions on the
>>>>>>>>>>>> impact on IOPS and storage space consumption, if we use double
>>>>>>>>>>>> mirroring(NORMAl redundancy) vs triple mirroring(High redundancy) in the
>>>>>>>>>>>> new X8 machine. I can see the benefit of double mirroring(Normal
>>>>>>>>>>>> redundancy) being saved in storage space(around 1/3rd in terms of DATA and
>>>>>>>>>>>> REDO copies), but then what is the risk wrt data loss, is it okay in a
>>>>>>>>>>>> production system? (Note- We do use ZDLRA backup for taking the DB backup.
>>>>>>>>>>>> And for disaster recovery we have active data guard physical standby in
>>>>>>>>>>>> place which runs in read only mode).
>>>>>>>>>>>>
>>>>>>>>>>>> With regards to IOPS, we are going with default write back
>>>>>>>>>>>> flash cache enabled here. Is it correct that with double mirroring we have
>>>>>>>>>>>> to write/read into two places VS in triple mirroring we have to do that in
>>>>>>>>>>>> three places , so there will also be degradation in IOPS with triple
>>>>>>>>>>>> mirroring/High redundancy as compared to double mirroring? if it's true
>>>>>>>>>>>> then by what percentage the IOPS degradation will be there? And then if
>>>>>>>>>>>> it's okay if we go for double mirroring as that will benefit us wrt IOPS
>>>>>>>>>>>> and also saves a good amount of storage space?
>>>>>>>>>>>>
>>>>>>>>>>>> Regards
>>>>>>>>>>>>
>>>>>>>>>>>> Lok
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>
>
> --
> Gabriel Hanauer
>

--
http://www.freelists.org/webpage/oracle-l
Received on Wed Feb 17 2021 - 18:31:35 CET

Original text of this message