Re: sequential disk read speed

From: Brian Selzer <brian_at_selzer-software.com>
Date: Sun, 24 Aug 2008 00:39:47 -0400
Message-ID: <vE5sk.7687$np7.3558_at_flpi149.ffdc.sbc.com>


"Darren" <anon5874_at_yahoo.com> wrote in message news:20902141-59e3-46d1-9806-0c9e1c02f40b_at_79g2000hsk.googlegroups.com...
> On Aug 21, 12:08 am, David BL <davi..._at_iinet.net.au> wrote:
> > On Aug 21, 8:36 am, Darren <anon5..._at_yahoo.com> wrote:
> >
> >
> >
> >
> >
> > > I am learning about database systems, and I am reading a book called
> > > "Physical Database Design".
> >
> > > It gets to a bit about a large sequential access (e.g. for a full
> > > table scan), and does the following:
> >
> > > It says "Since most disk systems use prefetch buffers to speed up
> > > table scans, we
> > > assume a 64 KB prefetch block"
> >
> > > So to calculate the time for a full table scan, it multiples the
> > > number of 64KB blocks by the time it takes to seek and read (2.02ms).
> > > In other words, it is seeking each 64KB block.
> >
> > > Why can a disk only read 64KB at a time? Is this a valid assumption?
> > > Is this a disk limitation or a file system limitation?
> >
> > A high end modern HD with 4ms average seek will on average take about
> > 7ms to access and an additional 0.5ms to read a randomly located 64k
> > buffer. This mismatch shows that 64k blocks are too small for
> > optimal read performance. 512k or 1Mb blocks would be more suitable.-
> > Hide quoted text -
> >
> > - Show quoted text -
>
> But what dictates the block size? Is this defined by the physical
> disk, the file system, or the database code?
>

There can be a marginal reduction in cpu cycles and in some cases I/O by matching the block size of the disk subsystem to the block size specified in the database engine, but in any system where performance is critical, the disk subsystem will involve a caching disk array or multiple caching disk arrays, and the block size becomes moot as a result of the caching disk array controller reading an entire track at a time. Memory to memory transfers are negligible when compared to disk to memory transfers--let alone seeks. Moreover, caching controllers employ a number of technologies, such as elevator seeking, to minimize the impact of seek times, and by reading an entire track at a time, latency--that is, the time it takes for the data requested to arrive at the head for reading--is effectively eliminated as a factor. (There is still a negligible latency which is rougly half the time it takes for the head to pass over a sector, but as there are hundreds of sectors per track, the time involved is not worth considering.) Not to mention that a savvy dba will spread data access across as many heads as can be budgeted for to maximize throughput and minimize seek time. If you have a 100GB database and you put it on single 100GB disk drive, your best average seek time is the average seek time of the disk drive, but if you put the database on four 100GB disk drives, the the best average seek time will only be a fraction of the seek time of the single disk. Suppose that the full-stroke seek time on the 100GB disk is 7ms and the track-to-track seek time is 1ms. Well, with four disks, instead of an average 4ms seek time, the individual seek time of each disk is reduced to roughly 2.5ms, and since there are four disks, the average seek time for the disk subsystem is reduced to a quarter of that or roughly .625ms. Add a mirror (RAID 0+1 using 8 drives), and you introduce fault tolerance while at the same time halving again seek time for reads. Note that except in rare cases, the ratio of reads to writes in a database is better than 10:1, so any strategy that improves read time without significantly impeding write time, such as is the case with implementing RAID 0+1, should be vigorously pursued. Incidentally, RAID 0+1 is also good for transaction logs and temporary tables, which involve mostly writes or a roughly equal number of reads and writes, but it usually makes sense to segregate the logs from the database--for recovery if nothing else, and it also often makes sense to segregate the location where temporary tables are housed from both the database and the logs. Received on Sun Aug 24 2008 - 06:39:47 CEST

Original text of this message