RE: block size

From: Mark W. Farnham <>
Date: Mon, 26 Oct 2009 18:31:25 -0400
Message-ID: <>

  1. The bigger the block size, the greater the percentage of user data that is stored per block in tables. (Regardless of compression, this is simply a question of block overhead for Oracle's bookkeeping of what is in the block. Compression might enhance this.)
  2. The bigger the block size, the more keys will fit per non-leaf index block so the blevel MAY be less (and it will never be more than with a smaller block size).
  3. The bigger the block size, the more key value/rowid references are stored per unit volume in index leaf blocks. (Block bookkeeping overhead and possibly better compression. So index scans will be faster.)
  4. The bigger the block size, the fewer the multi-block rows (But maybe Greg's point about good design also cures that.)

What you mentioned about the MTU size of 9000 and an 8K block fitting. That is an excellent point and if your DW is RAC I think that might well outweigh any other advantages.

Now points 1 through 3 are just simple facts about how Oracle stores stuff. Point number 2, while true, might not often be observed to matter for an actual dataset, because spillage from one blevel to another covers a huge and exponentially growing range. When it is true for an individual dataset however, it saves what is usually a full single block access. That can be a big deal and your mileage will vary.

There may indeed be a difference between "the largest blocksize you can have" and the largest blocksize that has been proven reliable on a given platform. And it is possible to set up the disk farm so it double clutches routinely on larger block sizes.(Don't.) But with the prevalence of 1 MB stripe (and i/o) sizes, that is getting pretty rare these days. Greg's point on the i/o size of scan is right on target. You're not going to see much, if any difference in physical i/o to get 128 8K blocks versus 64 16K blocks. If you do, then I think someone made a mistake building the disk farm. I suppose 16K might even be slightly slower, since it has a greater possibly to have the block on two different stripes. I think all that, at the sizes we're talking about, is splitting hairs. I think I am violently agreeing (not dis-agreeing) with Greg on that point.

Now, for a test suggestion on some hefty data: Have Oracle snap off their own ebiz database just before a payroll run. Reload that at a larger block size. Compare the time to completion for the real and the test larger block size payroll runs. That should be illustrative. Even though that is an OLTP job, it is batch enough to make the point (or fail to make the point).

Now, is it worthwhile to stray from the "standard" 8K block size (funny that 2K was the standard in Oracle 6 and I had to argue hard to get to use 8K at 2K versus 8K the blevel argument was very solid)?

Maybe not. I'd GUESS the effects of fitting more user data per unit volume of disk will get you no more than 10-20%. A good design and good queries in place of a bad design and bad queries produce an improvement bounded only by the total time of the bad way. Maybe yes: If you're going to do a good design and good queries either way, then you should indeed win at the margin with a larger block size. Measure it yourself. Build a 8K database, load a big table. FTS it. Repeat at 16K. If you're RAC, well then the MTU argument might dominate. Otherwise I'm going to be very surprised if 16K is not faster and wonder how you rigged the test.



-----Original Message-----
From: [] On Behalf Of Greg Rahn
Sent: Monday, October 26, 2009 4:32 PM
Subject: Re: block size

On Mon, Oct 26, 2009 at 11:32 AM, Niall Litchfield <> wrote:
> so, discussion with DW architect today
> "but this isn't OLTP,this is DW, in a datawarehouse you want the biggest
block size you can have".
> my question to you (and him)
> "why? What's the theory and evidence for this?"
> In other words *why* do people think this?

Lets start with the last question first: People think this probably because of:
1) they think better is better for an unknown, unquantified reason 2) someone somewhere on the Internet wrote it, so it must be true

The reality (facts) of it is this:
- a larger block *may* give a slightly better compression ratio as more values are packed within a block and Oracle uses block level compression
- in a DW where physical table/partition scans are prevalent, the IO size matters most, not the block size (Oracle can do 1MB reads no matter what the block size)
- 8k blocks are the default and more tested value and block size can influence execution plan and most people know that a suboptimal execution plan will easily offset any *potential* gains from a non-8k block
- with RAC an 8k block fits within a jumbo frame (MTU 9000) and that helps reduce system time for interconnect traffic (bigger blocks will get split)

When people make claims on something, I generally put my engineering hat on and ask, "What data did you look at to draw those conclusions (and can I see it too)?". That question alone is enough to end many (silly) discussions.

Greg Rahn

Received on Mon Oct 26 2009 - 17:31:25 CDT

Original text of this message