Re: Enable 32K Block in 8K Block DB

From: Howard J. Rogers <hjr_at_dizwell.com>
Date: Tue, 30 Mar 2004 12:40:34 +1000
Message-ID: <4068de25$0$20347$afc38c87@news.optusnet.com.au>

"Mike Ault" <mikerault_at_earthlink.net> wrote in message news:37fab3ab.0403291707.6f43b737_at_posting.google.com...
>
> Let's examine these allocations objectively shall we?

We can try for objectivity, but you'll have to leave your "We Love Mr B." button at the door.

> 1 - Indexes like large tablespaces
> This is quite true.

I don't doubt it for a minute, provided one assumes by "large tablespaces" you mean "large blocksize tablespaces". But that rather misses the point, doesn't it? We're not discussing whether indexes like big blocks or not, but whether you can artificially retro-fit big blocks into a database that is running on a file system without direct I/O and expect to get performance benefits from doing so. And you can't.

So whilst the indexes may be pining for bigger blocks, they can't have 'em if you are stuck with such a file system.

On the other hand, it is fair to say, I think, that an OLTP system is likely to make greater use of indexes than a warehouse or DSS, and thus the bald recommendation to go for massive block sizes for indexes (assuming you have the direct i/o freedom to go for massive blocks in the first place) nevertheless needs to be treated with extreme caution, because of the contention issues that are likely to arise. Mr. B's article makes no such caveat.

> Robin Schumacher proved that the index tree builds
> cleaner and that range scans happen with less consistent gets:
>
>

http://www.embarcadero.com/resources/tech_papers/ResolvingOracleSpace_6_26.pdf
>
> Note the test that proves it.
>
> 2 - TEMP likes large blocksizes
>
> This is very true.

No-one said it wasn't. But again, the issue is whether or not you can ever use a non-standard block size for TEMP (you can't) or whether it would be sensible to pick a large block size at the point of creating your database in an attempt to get TEMP to use big block sizes (on a file system without direct I/O, it wouldn't).

>All temp segment access is sequential and 32k
> blocksizes greatly reduce logical I/O and disk sort times. I'm working
> an a benchmark right now that shows a one-third speed improvement of
> disk sorts in a 32k vs. an 8k blocksize.
>
> While the database will have to be created with the blocksize this
> large to use temporary tablespaces this size

Yes, a minor detail which Mr. B missed out.

>, most databases that
> require large sorts will be data warehouse or DSS and thus will also
> benefit from large blocksizes.

Provided they are using direct I/O, I have no qualms about that statement.

> 3 - Multiple blocksizes
>
> Multiple blocksizes are used in ALL the Oracle10g TPC-C benchmarks
> because they allow far faster transaction execution with less I/O.
> These vendors spent hundreds of thousands of dollars to get the
> fastest performance and their choice of multiple blocksizes speaks for
> itself.
>
> UNISYS benchmark: 250,000 transactions per minute.
>
> http://www.tpc.org/results/FDR/TPCC/unisys_es7000-420_291K_040209_fdr.pdf
>
> db_16k_cache_size = 15010M
> db_8k_cache_size = 1024M
> db_cache_size = 8096M
> db_keep_cache_size = 78000M

Unisys claim they used Windows 2003 as their operating system for their tests and NTFS as their file system. NTFS doesn't have a file system buffer, but uses direct I/O natively. Therefore, use of multiple block sizes for such a setup is just fine -and doesn't conflict with my (and Steve Adams') advice on the matter.

> HP benchmark: 1,000,000 transactions per minute.
>
>

http://www.tpc.org/results/FDR/TPCC/HP%20Integrity%20rx5670%20Cluster%2064P_FDR.pdf
>
> db_cache_size = 4000M
> db_recycle_cache_size = 500M
> db_8k_cache_size = 200M
> db_16k_cache_size = 4056M
> db_2k_cache_size = 35430M

I can't see anywhere in that report what file system they were using, other than a reference to a truly enormous SAN, with some form of cache acceleration enabled. I have no idea whether that is in conflict with my advice about buffered file systems or not.

> Now, post some actual tests that prove or disprove these and we can
> discuss this some more. Until then, let's move on.

No-one is arguing with the proposition that big block sizes in a data warehouse are a good idea. Merely that they're a good idea you *can't actually implement* if you're stuck on a file system that has a file system buffer and no way of by-passing it. Not without introducing scalability or different performance problems.

Test results of my own? In the pipeline, just like yours are. But this is hot off the press:

Select time in 4K block tablespace: 5:11.51 with 115972 physical reads Select time in 16K block tablespace 5:09.03 with 28155 physical reads

Same table (about 5 million rows). And this on Linux, ext3 (with 4K O/S blocks by default). And those are average physical reads and run-times over 5 test runs. Now you will notice that despite the massive reduction in reported physical reads (courtesy of autotrace), the run times are practically identical. Which kind of suggests that at the physical layer, the same work is being performed in each case. So regardless of what physical reads Oracle *thinks* it is doing, the file system is still imposing its 4K-O/S-block discipline. Solaris and Windows results to come in due course, together with DML results for all three.

Your points are sound *in a specific context* (namely, no buffered I/O). My problem with Mr. B's article you seem to wish to defend is that not once, nowhere, does he qualify his comments in that manner. His recommendations are bald generalisations without reservation that will set a lot of people on completely the wrong path. That's all.

Regards
HJR Received on Mon Mar 29 2004 - 20:40:34 CST