Re: Question on Exadata X8 - IO

From: Rajesh Aialavajjala <r.aialavajjala_at_gmail.com>
Date: Wed, 17 Feb 2021 12:47:33 -0500
Message-ID: <CAGvtKv4G+ETXhDf=3V7RAp81xDzqhzeQK574PRm5rXJ-3uT6tg_at_mail.gmail.com>



Lok,
 You should be able to find the X5-2 data sheets that will have the IOPS information you are seeking.

https://www.oracle.com/technetwork/database/exadata/exadata-storage-expansion-x5-2-ds-2406252.pdf

That is for the X5-2 storage expansion rack but it should have the information you want. What size drives are in your X5-2? As I mentioned earlier the storage size changed mid generation.

Thanks,

—Rajesh

On Wed, Feb 17, 2021 at 12:31 Lok P <loknath.73_at_gmail.com> wrote:

> Thank you Gabriel. I was trying to compare the current X5-2 VS the new X8
> figures side by side. I got most of these for X8M-2from your suggested doc
> but am not able to get a few IOPS figures for X5-2 as below. Would you
> suggest some document which has this information available?
>
> IOPS For Half Rack
> Flash Read IOPS Flash Write IOPS Disk Read IOPS Disk Write IOPS
> X5-2 High capacity
> X8M-2 High capacity 6million 3.2million
> Storage capacity for half rack
> Raw Disk Storage Usable Disk Storage with High Redundancy Usable Disk
> Storage with Normal Redundancy Flash Storage Number of Cores
> X5-2 High capacity 672TB 210TB 280TB 42TB
> X8M-2 High capacity 1176TB 349TB 477TB 179TB 224
>
>
> Regards
> Lok
>
> On Wed, Feb 17, 2021 at 5:38 PM Gabriel Hanauer <gabriel.hanauer_at_gmail.com>
> wrote:
>
>> Hello Lok,
>>
>> You could look at the X8M-2 Datasheet.
>>
>>
>> https://www.oracle.com/a/ocom/docs/engineered-systems/exadata/exadata-x8m-2-ds.pdf
>>
>> There is a handful of information in there.
>>
>> Regards,
>>
>>
>>
>> On Wed, Feb 17, 2021 at 7:17 AM Lok P <loknath.73_at_gmail.com> wrote:
>>
>>> Thank you so much Rajesh. It really helps.
>>>
>>> My thought was that ~168TB Raw storage per server means 168/2= ~84TB
>>> usable storage per cell server with Normal redundancy and 168/3=~56TB
>>> usable storage per cell server with High redundancy. But it seems it's not
>>> that simple and some additional amount of space is really going out of the
>>> way.
>>>
>>> In current X5-2 high capacity Half Rac , it looks like we are capped
>>> around ~1 to 1.5million IOPS for flash and ~18K IOPS for hard disk. What is
>>> the flash and hard disk IOPS limit for X8 High capacity half rac ?
>>>
>>> And i confirmed with the infra team, we are planning to use the
>>> 19.0.0.0.0 ASM grid in new X8 keeping the Oracle database version at
>>> 11.2.0.4 only , so it means we are eligible for flex disk groups.
>>>
>>> Regards
>>> Lok
>>>
>>> On Wed, Feb 17, 2021 at 1:52 AM Rajesh Aialavajjala <
>>> r.aialavajjala_at_gmail.com> wrote:
>>>
>>>> Lok,
>>>> Here are the numbers as relates to "usable" storage when it comes to
>>>> X8-2 / X8M-2 Exadata
>>>>
>>>> A 1/2 rack X8-2 will offer <this is 4 compute nodes + 7 storage cells>
>>>> approximately
>>>>
>>>> Data Capacity (Usable) – Normal Redundancy = 477 TB
>>>> Data Capacity (Usable) – High Redundancy = 349 TB
>>>>
>>>> What Grid Infrastructure version are you planning to use? You are
>>>> correct that FLEX disk groups/redundancy was introduced with GI/ASM version
>>>> 12.2.0.1 and higher.
>>>>
>>>> Thanks,
>>>>
>>>> --Rajesh
>>>>
>>>>
>>>> On Mon, Feb 15, 2021 at 10:45 AM Lok P <loknath.73_at_gmail.com> wrote:
>>>>
>>>>> Thanks a lot Rajesh and Shane. It really gave valuable info about the
>>>>> X8 configuration.
>>>>>
>>>>> If I sum up the data file(v$datafile) + tempfile(v$tempfile)+log
>>>>> file(v$log), it's coming as ~150TB now out of which ~45TB is showing as
>>>>> free in dba_free_space. And the break up of the ~150TB is coming as ,
>>>>> ~144TB is data file , 1.2TB is tempfile, ~.5TB of log file. So it seems
>>>>> this figure of ~150TB is the size of a single copy but not the sum of three
>>>>> copies which are maintained for the HIGH redundancy. And with HIGH
>>>>> redundancy we must be occupying ~150*3=~450TB of space from ASM or from
>>>>> total 7 storage cells. Please correct me if I am wrong.
>>>>>
>>>>> And as per the X8 configuration you mentioned, it looks like the new
>>>>> X8 half RAC, we will get ~25.6*7= ~180TB of flash (against ~41TB in current
>>>>> X5) and ~168*7= ~1PB of hard disk(as opposed to ~614TB of hard disk in
>>>>> current X5). So considering for large reads, we are touching max flash IOPS
>>>>> of ~2million during peak load(opposed to ~1.3million IOPs limit in current
>>>>> X5) but here in new X-8, we will have 4.5time more overall flash, so hope
>>>>> things(both flash IOPS and Storage) will be lot better and well under the
>>>>> limit/capacity on new X8.
>>>>>
>>>>> We will try to push for making the ASM redundancy HIGH. But I was
>>>>> thinking with high Redundancy what will be the maximum size of the data we
>>>>> can hold with this X-8 half RAC. As in new X8 , we will be having ~1PB of
>>>>> raw storage which is almost twice of current ~614TB in X-5, so what will
>>>>> be the max usable storage available for our database if we go for HIGH
>>>>> redundancy VS NORMAL redundancy with our new X-8 machine?
>>>>>
>>>>> I was also checking regarding flex disk groups and seeing in a few
>>>>> docs the RDBMS is asked to be minimum on 12.2, but we are currently on
>>>>> 11.2.0.4 , so perhaps that is not an option at this moment at least.
>>>>>
>>>>> Regards
>>>>> Lok
>>>>>
>>>>> On Mon, Feb 15, 2021 at 6:44 PM Rajesh Aialavajjala <
>>>>> r.aialavajjala_at_gmail.com> wrote:
>>>>>
>>>>>> Lok,
>>>>>> To try and answer your question about the percentage/share of flash
>>>>>> vs hard disk in the Exadata X8-2/X8M-2
>>>>>>
>>>>>> HC Storage Cells are packaged with
>>>>>>
>>>>>> 12x 14 TB 7,200 RPM disks = 168 TB Raw
>>>>>> / Hard Disk
>>>>>> 4x 6.4TB NVMe PCIe 3.0 Flashcard = 25.6 TB Raw / Flash
>>>>>>
>>>>>> So the ~166TB that Andy references is purely spinning drives/hard
>>>>>> disks - you have another ~25TB of Flash.
>>>>>>
>>>>>> If your organization is acquiring the X8M-2 hardware - you will add
>>>>>> on 1.5 TB of PMEM (Persistent Memory) to this.
>>>>>>
>>>>>> You may certainly add more storage cells to your environment -
>>>>>> elastic configurations in Exadata is a common thing - expansions or
>>>>>> starting with elastic configurations is often seen. You might want to
>>>>>> consider expanding storage after you evaluate your new X8 configuration.
>>>>>>
>>>>>> Your X5- 2 hardware has/had <the drive sizes did change mid-way
>>>>>> through the X5-2 generation - they started out with 12x4 TB drives and that
>>>>>> was doubled to the 8 TB drives>
>>>>>>
>>>>>> 4 PCI Flash cards each with 1.6 TB (raw) Exadata Smart Flash Cache
>>>>>> 12x 8 TB 7,200 RPM High Capacity disks
>>>>>>
>>>>>> So you are looking at quite an increase in raw storage capacity and
>>>>>> Flash.
>>>>>>
>>>>>> +1 to Shane's mention about not using NORMAL redundancy in a
>>>>>> Production configuration - FLEX disk groups are permitted in Exadata but
>>>>>> they are not widely used to the best of my understanding - the Oracle best
>>>>>> practices recommend HIGH redundancy - in fact I do not think you can
>>>>>> configure FLEX redundancy within OEDA as it falls outside best practices -
>>>>>> so you would have to tear down/recreate the disk groups manually. Or
>>>>>> customize the initial install prior to moving your database(s)
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> --Rajesh
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Mon, Feb 15, 2021 at 8:09 AM Shane Borden <
>>>>>> dmarc-noreply_at_freelists.org> wrote:
>>>>>>
>>>>>>> I think it's a mistake to go with NORMAL redundancy in a production
>>>>>>> system. I could probably understand the argument for a test system, but
>>>>>>> not production. How do you think a regular storage array is configured?
>>>>>>> Likely not with a normal redundancy scheme. Aside from all of the other
>>>>>>> things mentioned already, you are now also bound to offline patching or if
>>>>>>> you try to do rolling you run a risk being able to tolerate only 1 disk
>>>>>>> failure or 1 storage server failure. Not only does this protect against
>>>>>>> some sort of technical failure, but also the human factor that could occur
>>>>>>> during patching
>>>>>>>
>>>>>>> If you must consider normal redundancy, I would go with FLEX
>>>>>>> Diskgroups vs configuring the entire rack as normal redundancy. That way,
>>>>>>> if you must, then you can specify the redundancy at the table space level
>>>>>>> rather than the disk group level. Should you change your mind later, its a
>>>>>>> simple alter command to change the redundancy rather than tearing down the
>>>>>>> entire rack and rebuilding it.
>>>>>>>
>>>>>>>
>>>>>>> ---
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>>
>>>>>>> Shane Borden
>>>>>>> sborden76_at_yahoo.com
>>>>>>>
>>>>>>> On Feb 15, 2021, at 1:55 AM, Lok P <loknath.73_at_gmail.com> wrote:
>>>>>>>
>>>>>>> Thanks much Andy.
>>>>>>>
>>>>>>> Yes we have an existing machine that is X5-2 half Rac. (It's
>>>>>>> basically a full RAC machine logically splitted to two half RAC and we have
>>>>>>> only this database hosted on this half RAC). (And we are currently ~150TB
>>>>>>> and keep growing so we're planning for NORMAL redundancy.)
>>>>>>>
>>>>>>> In current X5 I am seeing its ~80TB hard disk+6TB flash disk per
>>>>>>> storage cell. When you said *"The Exadata X8 and X8M storage cells
>>>>>>> have 14TB disks. With 12 per cell, that's 168TB *per cell*." *Does
>>>>>>> it mean the sum of flash+hard disk is ~168TB per cell? What is the
>>>>>>> percentage/share of flash disk and hard disk in that?
>>>>>>>
>>>>>>> Apart from current storage saturation, with regards to the IOPS
>>>>>>> issue in our current X5 system, I am seeing in OEM the flash IOPS is
>>>>>>> reaching to ~2000K for large reads and the max limit is showing somewhere
>>>>>>> near ~1.3milion. Overall IO utilization for flash disk is showing ~75%.
>>>>>>> Hard disk IO limit shows as ~20K and most of the time it looks to be both
>>>>>>> small reads and large reads are staying below this limit. The overall IO
>>>>>>> utilization is staying below ~30% for the hard disks.
>>>>>>>
>>>>>>> Just got to know from the infra team the X8 to which we are
>>>>>>> planning to move into is not extreme flash rather high capacity disk
>>>>>>> only(similar to what we have on current X5) , but considering more flash
>>>>>>> storage and hard disk storage in each of the 7 - storage cells, we are
>>>>>>> expecting the new X8 will satisfy the current capacity crunch both wrt
>>>>>>> space and IOPS.
>>>>>>>
>>>>>>> And as you just mentioned in your explanation that adding more
>>>>>>> storage cells will also help in bumping up the capacity. So should we
>>>>>>> consider adding a few more new storage cells on and above half rac to make
>>>>>>> it 8 or 9 storage cells in total and if this is standard practice in the
>>>>>>> exadata world?
>>>>>>>
>>>>>>> Regards
>>>>>>> Lok
>>>>>>>
>>>>>>> On Sat, Feb 13, 2021 at 9:00 PM Andy Wattenhofer <watt0012_at_umn.edu>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> The Exadata X8 and X8M storage cells have 14TB disks. With 12 per
>>>>>>>> cell, that's 168TB *per cell*. You haven't mentioned which rack size your
>>>>>>>> X5 machine is, but from the numbers you're showing it looks like maybe a
>>>>>>>> half rack. A half rack of X8M will come with 1PB of total disk, giving you
>>>>>>>> over 300TB of usable space to divide between your RECO and DATA disk groups
>>>>>>>> if you are using HIGH redundancy. That seems plenty for your 150TB
>>>>>>>> database. But if you need more, add another storage cell.
>>>>>>>>
>>>>>>>> As for performance degradation from using HIGH redundancy, you need
>>>>>>>> to consider that the additional work of that extra write is being taken on
>>>>>>>> by the storage cells. By definition the redundant block copies must go to
>>>>>>>> separate cells. NORMAL redundancy writes to two cells and HIGH goes to
>>>>>>>> three. In aggregate, each write will be as fast as your slowest cell. So
>>>>>>>> any difference in write performance is more a function of the total number
>>>>>>>> of cells you have to share the workload. That difference would be
>>>>>>>> diminished as you increase the number of cells in the cluster.
>>>>>>>>
>>>>>>>> And of course that difference would be mitigated by the write back
>>>>>>>> cache too because writes to the flash cache are faster than writes to disk.
>>>>>>>>
>>>>>>>> Honestly, I can't imagine that Oracle would sell you an Exadata
>>>>>>>> machine where any of this would be a problem for you. It would be so
>>>>>>>> undersized from the beginning that your problems with it would be much
>>>>>>>> greater than any marginal difference in write performance from using high
>>>>>>>> redundancy.
>>>>>>>>
>>>>>>>> Andy
>>>>>>>>
>>>>>>>>
>>>>>>>> On Fri, Feb 12, 2021 at 10:31 AM Lok P <loknath.73_at_gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Thanks Much.
>>>>>>>>>
>>>>>>>>> I got some doc but missed. below URL also pointing to high
>>>>>>>>> redundancy as requirement but may be its not compulsory as you stated.
>>>>>>>>>
>>>>>>>>> We have the size of our database i.e. (~150TB), so we were
>>>>>>>>> thinking to have some space reduction using double mirror rather triple
>>>>>>>>> mirroring. but i was not aware that the disk size itself is a lot bigger in
>>>>>>>>> X8 and as you stated in X8 we have a bigger size disk and so the mirroring
>>>>>>>>> will take a lot of time(in case of crash/failure) and thus HIGH redundancy
>>>>>>>>> is recommended. I think we have to relook into the same. Note- What I see
>>>>>>>>> in the current X-5 machine, we have ~6TB flash/storage server and ~80TB
>>>>>>>>> hard disk/storage server. Not sure what that is in case of X8 though.
>>>>>>>>>
>>>>>>>>> And another doubt i had was, Is it also true that the IOPS will
>>>>>>>>> be degraded by some percentage in case of tripple mirror as compared to
>>>>>>>>> double mirror as because it has to write one more additional copy of data
>>>>>>>>> block to flash/disk?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> https://stefanpanek.wordpress.com/2017/10/20/exadata-flash-cache-enabled-for-write-back/
>>>>>>>>>
>>>>>>>>> On Fri, Feb 12, 2021 at 8:52 PM Ghassan Salem <
>>>>>>>>> salem.ghassan_at_gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> Please, can you point to where you saw that write-back is only
>>>>>>>>>> possible with high-redundancy?
>>>>>>>>>> High redundancy is very much recommended with X8 due to the size
>>>>>>>>>> of the disks, and the time it takes to re-mirror in case of disk loss: if
>>>>>>>>>> you're in a normal redundancy, and you loose a disk, while re-mirroring is
>>>>>>>>>> being done, you don't have any second copy of that data, and so, if you
>>>>>>>>>> loose yet-another disk, you're in big trouble. With lower capacity disks,
>>>>>>>>>> the re-mirroring takes much less time, and so the risk is lower.
>>>>>>>>>>
>>>>>>>>>> regards
>>>>>>>>>>
>>>>>>>>>> On Fri, Feb 12, 2021 at 3:57 PM Lok P <loknath.73_at_gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Basically i am seeing many doc stating triple mirroring is
>>>>>>>>>>> recommended with "write back flash cache" and some other stating write back
>>>>>>>>>>> flash cache is not possible without HIGH redundancy/triple mirroring. So
>>>>>>>>>>> there is a difference between these two statements because if we decide to
>>>>>>>>>>> go for NORMAL redundancy to save some space and to have some IO benefit(in
>>>>>>>>>>> terms of not writing one more additional data block copy). But we want to
>>>>>>>>>>> utilize the "write back flash cache" option to get benefits on write IOPS.
>>>>>>>>>>> And in this case if restriction is put in place for "High redundancy" we
>>>>>>>>>>> won't be able to do that. Please Correct me if my understanding is wrong?
>>>>>>>>>>>
>>>>>>>>>>> On Fri, Feb 12, 2021 at 12:53 PM Lok P <loknath.73_at_gmail.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Seeing in below doc which state its recommended to go for high
>>>>>>>>>>>> redundancy ASM disk group(i.e. triple mirroring) in case we are using write
>>>>>>>>>>>> back flash cache as because the data will be first written/stay in flash
>>>>>>>>>>>> cache and flushed to the disk later stage and in case of failure it has to
>>>>>>>>>>>> be recovered from mirror copy. But i am wondering , is this not possible
>>>>>>>>>>>> with double mirroring , will it not survive the data loss in case of
>>>>>>>>>>>> failure? Want to understand what is the suggested setup which will give
>>>>>>>>>>>> optimal space usage without compromising IOPS and data loss.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> https://docs.oracle.com/en/engineered-systems/exadata-database-machine/sagug/exadata-storage-server-software-introduction.html#GUID-E10F7A58-2B07-472D-BF31-28D6D0201D53
>>>>>>>>>>>>
>>>>>>>>>>>> Regards
>>>>>>>>>>>> Lok
>>>>>>>>>>>>
>>>>>>>>>>>> On Fri, Feb 12, 2021 at 10:42 AM Lok P <loknath.73_at_gmail.com>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hello Listers, We are moving from exadata X5 to X8 and there
>>>>>>>>>>>>> are multiple reasons behind it. Few of them are , 1)we are almost going to
>>>>>>>>>>>>> saturate the existing storage capacity(DB size reaching ~150TB) in current
>>>>>>>>>>>>> X5. 2)And the current IOPS on X5 is also reaching its max while the system
>>>>>>>>>>>>> works during its peak load.
>>>>>>>>>>>>>
>>>>>>>>>>>>> We are currently having HIGH redundancy(triple mirroring) for
>>>>>>>>>>>>> our existing X5 machines for DATA and RECO disk group and DBFS is kept as
>>>>>>>>>>>>> NORMAL redundancy(double mirroring). Now few folks raised questions on the
>>>>>>>>>>>>> impact on IOPS and storage space consumption, if we use double
>>>>>>>>>>>>> mirroring(NORMAl redundancy) vs triple mirroring(High redundancy) in the
>>>>>>>>>>>>> new X8 machine. I can see the benefit of double mirroring(Normal
>>>>>>>>>>>>> redundancy) being saved in storage space(around 1/3rd in terms of DATA and
>>>>>>>>>>>>> REDO copies), but then what is the risk wrt data loss, is it okay in a
>>>>>>>>>>>>> production system? (Note- We do use ZDLRA backup for taking the DB backup.
>>>>>>>>>>>>> And for disaster recovery we have active data guard physical standby in
>>>>>>>>>>>>> place which runs in read only mode).
>>>>>>>>>>>>>
>>>>>>>>>>>>> With regards to IOPS, we are going with default write back
>>>>>>>>>>>>> flash cache enabled here. Is it correct that with double mirroring we have
>>>>>>>>>>>>> to write/read into two places VS in triple mirroring we have to do that in
>>>>>>>>>>>>> three places , so there will also be degradation in IOPS with triple
>>>>>>>>>>>>> mirroring/High redundancy as compared to double mirroring? if it's true
>>>>>>>>>>>>> then by what percentage the IOPS degradation will be there? And then if
>>>>>>>>>>>>> it's okay if we go for double mirroring as that will benefit us wrt IOPS
>>>>>>>>>>>>> and also saves a good amount of storage space?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Regards
>>>>>>>>>>>>>
>>>>>>>>>>>>> Lok
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>
>>
>> --
>> Gabriel Hanauer
>>
> --
Sent from Gmail Mobile

--
http://www.freelists.org/webpage/oracle-l
Received on Wed Feb 17 2021 - 18:47:33 CET

Original text of this message