RE: Choosing data file size for a multi TB database?

From: Mark W. Farnham <mwf_at_rsiz.com>
Date: Tue, 30 Aug 2005 09:11:47 -0400
Message-ID: <KNEIIDHFLNJDHOOCFCDKIEMOGNAA.mwf@rsiz.com>

First, you try to find a reasonable way to determine the i/o throughput requirements. If you will have significantly peaked load, you need your requirements in "burst mode" (acheivable in some layer of cache) and sustained throughput (rememebering to include getting the archived redo logs to multiple stable destinations, preferably including one that is offsite, and depending on your availablility requirements, possibly WAN bandwidth to a remote standby location.) Remember to look at your entire i/o pathway from spinning rust or SSD through the various cache levels. If you know in advance that certain objects will be extremely "hot" in i/o compared with the rest of the objects, consider provisioning hardware i/o capability at two or more levels of performance and be certain to keep these pools separate in allocating your resources.(see note on SAME, below).

Second, you choose a hardware architecture that is capable of maintaining spread load as you grow. After you determine the hardware architecture, you will know the size of a piece of growth. With a very large database you probably should plan to add large chunks of hardware, such as adding a pair of i/o controllers and two new disk filled "trays." Not only is this cost effective in purchasing and labor and disturbance risk effective in installation, it blends well with the leading layout methodologies (SAME, BORING, and dare I say ASM yet?).

Third, be committed to having nothing but Oracle on this part of your disk farm. At large sizes compared to the number of disks that will saturate controllers, you have a tough need to explain to want to pollute the Oracle i/o signature with anything else.

Okay, now for each "speed" rating of hardware component, you should now be able to compute the size of expansion. (Like adding a pair of trays of disks).
Most likely for Oracle you will be best off to duplex drives in the hardware and stripe across whole trays. (Note on SAME, Stripe and Mirror Everything, as per Juan Loaiza indicates one stripe across everything. Find his SAME paper for how to add storage to SAME. This does not dovetail exactly with multiple pools of storage at different speeds.) For BORING (Balanced Organization of Resources in Natural Groups) and ASM you add storage to a given speed group and assign needs to speeds.

Anyway, once you've picked your hardware and your layout architecture, you have know the numbers to use to figure out the size of files.

For SAME, read the papers.

For ASM create a LUN across the whole tray and give it to ASM. If OS limits force you to make smaller LUNs than the entire tray, then you have to compromise and lie a little bit to ASM. You'd like to give LUNs to ASM within disk groups that are equal in i/o capability. Ideally that includes the controller paths, so if you call a whole tray a LUN you're all set. If that is too big, you have to pretend that you might not saturate controllers, but you can minimize the lie by defining a consistent horizontal stripe across a whole tray so you have the same number of LUNs across each tray within a disk pool.

For BORING you do pretty much the same thing. (In fact ASM completely supports the BORING layout methodology). Except of course in BORING you don't necessarily entrust your storage to ASM yet. Instead of LUNs, the whole trays are files. Or in the alternative several matching sized stripes across
a whole tray become files, and all those files constitute a "stripeset". If you have multiple trays per controller instead of multiple controllers per tray, then you should sensibly arrange your "stripesets" into "stripeset groups" so that controllor capacity can be reasonably taken into accout. If you have known-in-advance peaked load that varies in hours or calendar periods, then by having multiple files per tray you can allocate tablespaces to files and tables to tablespaces such that you give the maximum throughput available to the particular application during its peak. Likewise, if you have applications likely to have common or unpredictable peak loads, isolate them from each other on different stripesets. There is no reason to let a bad i/o signature from one application group destroy the performance of all the application groups. Clearly, there is a planning exercise to the tradeoff between using all the capability you've got in parallel for one process versus guaranteeing an unintruded reserved level of service. In BORING you plan ahead for those tradeoffs and have reasonable consistent building blocks to rearrange yourself if you get it wrong or it changes. As ASM matures, it might be possible that if the hardware is set up so that you are not forced to lie to ASM, then ASM will dynamically rebalance the i/o effectively.

Finally, though, you've got a size (or sizes with multiple speed groups). If you're building a large disk farm.

One final note: For very large disk farms folks often toss the idea of non-volative SSD out the window too quickly. If you know in advance that parts of your database will be "hot" relative to the average on a continuous basis, then a modest proportion of SSD apart from your large storage array can "deheat" the cache for the disk farm so that windows when your sustained i/o forces you all the way to spinning rust are much less frequent. Often online redo logs, rollback segments (aka UNDO), and temporary are quite hot, and many applications and application suites have particular objects that are frequently or continuously "hot." If you go this route, but certain to understand the MTBF and power outage survival characteristics of the SSD you use. There exist devices that simply lose memory when you pull out the plug, some last for many days, and some utilize on board batteries to sweep memory to a pair of onboard drives.

Regards,

mwf

-----Original Message-----
From: oracle-l-bounce_at_freelists.org
[mailto:oracle-l-bounce_at_freelists.org]On Behalf Of Branimir Petrovic Sent: Tuesday, August 30, 2005 7:33 AM
To: oracle-l_at_freelists.org
Subject: Choosing data file size for a multi TB database?

How would you approach task of sizing data files for a project that will start with
a 1TB database but may relatively quickly grow to stabilize at around 10TB mark?

Obvious options are:

start with many smallish files (like 2GB each), then add some thousands more as the database grows, or
start with a number of largish data files (in 10-100GB range each), then add more such files to accommodate growth.

Neither of the above options look very desirable (to me at least). First option
might be bad choice with checkpointing in mind, but the second option is not the
winner if data files ever needs to be moved around. Anyway some initial choice must
be made, and all I'd like at this moment is not to give perilous initial advice...
(admission: once the "ball" starts rollin', this bastard ain't gonna be
mine:))

So from practical perspective - what would be the least troublesome choice?

Branimir

FYI I - OS platform is the darkest secret at this point, as is the hardware specs
(no-one can tell, early signs of "well communicated, well managed" project
are all
there)

FYI II - I've never had to deal with DBs much bigger than 100GB, thus the need for
"reality check"..

--
http://www.freelists.org/webpage/oracle-l


--
http://www.freelists.org/webpage/oracle-l

Received on Tue Aug 30 2005 - 08:17:01 CDT