Re: sequential disk read speed

From: David BL <davidbl_at_iinet.net.au>
Date: Thu, 28 Aug 2008 20:43:34 -0700 (PDT)
Message-ID: <f6ef3678-c7e9-4cd2-acaf-13cac28819d6_at_a1g2000hsb.googlegroups.com>

On Aug 28, 9:34 pm, "Brian Selzer" <br..._at_selzer-software.com> wrote:
> "David BL" <davi..._at_iinet.net.au> wrote in message
>
> news:40d67c8b-d516-4721-a52d-20579c2ca9ac_at_r35g2000prm.googlegroups.com...
>
>
>
>
>
> > On Aug 28, 10:47 am, "Brian Selzer" <br..._at_selzer-software.com> wrote:
> >> "David BL" <davi..._at_iinet.net.au> wrote in message
>
> >>news:b3a7632f-de18-46e8-8ce3-3c5aaf83d4b9_at_a3g2000prm.googlegroups.com...
>
>
> >> >> , and since there are four disks, the average seek
> >> >> time for the disk subsystem is reduced to a quarter of that or roughly
> >> >> .625ms.
>
> >> > In order for the effective seek time to be reduced to a quarter the
> >> > seeking must be independent. To achieve that I think the striping
> >> > would need to be very coarse (eg 512kb or 1Mb).
>
> >> Drives that support disconnection or some other command queueing
> >> mechanism
> >> are all that is needed for seeking to be independent.
>
> > If stripes are somewhat smaller than the DBMS block size, then every
> > drive (in the RAID 0) will be involved in the reading of each and
> > every DBMS block. No matter how you order those reads, each drive
> > needs to read a large amount of scattered data and the head will seek
> > around a lot. If that is the case then the only advantage arises
> > from your previously mentioned reduction in the overall range of
> > tracks over which the data resides on a given platter.
>
> > Alternatively if the stripe size is larger then each drive will read a
> > somewhat independent set of the DBMS blocks, and the effective seek
> > time can be reduced assuming the DBMS is able to issue overlapping
> > read requests for the DBMS blocks.
>
> Your argument rests on the assumption that data is randomly distributed in
> the stripes on the disk and doesn't take into account the fact that a
> high-end caching controller eliminates latency by reading an entire track at
> once. Isn't it true that there is a physical affinity between related data?
> Isn't it more likely that an index will occupy contiguous stripes than some
> random set--regardless of stripe size? Can you show that the number of
> tracks accessed by say, 128 coarse stripe reads is any less than the number
> of tracks accessed by 1024 fine stripe reads?

Yes, sometimes the DBMS manages to cluster all the necessary data so there is very little seeking required, and in that case it won�t matter what stripe size is used.

However, that is not always possible. For example consider a B+Tree on 1 billion records and in a short period of time the DBMS needs to read 100 records for given index values that are effectively at random with respect to the ordering on that data type. To keep it simple ignore the reading of the internal nodes of the B+Tree. Typically those 100 records will appear in roughly 100 different leaf nodes of the B+Tree. Furthermore due to the sheer size of the overall data those leaf nodes will tend to reside on different tracks. The unfortunate reality is that it isn�t possible to read these records without a lot of head seeking, even if the reads are ordered according to track position (ie elevator seeking). Now if RAID0 is used and the stripes are smaller that the B+Tree leaf nodes, then every drive will need to contribute to the reading of every leaf node. Each drive can read the stripes in any order it likes but it won�t avoid the fact that each drive performs ~100 seeks. If instead, each B+Tree leaf node resides in a single stripe (and therefore on a single drive) then with four drives in the RAID0, each drive will only need to perform ~25 seeks.

> >> I think using a coarse stripe is counterproductive. There would be a
> >> bigger
> >> chance that a seek in the middle of the read would be required.
> >> Consider:
> >> if 3.5 stripes fit on a track in one zone of the disk, then on average
> >> every
> >> fourth read would require an additional seek to get the remaining half
> >> stripe. If on the other hand, 28 stripes fit on a track, then no
> >> additional
> >> seeks would be necessary. Even if it were 28.5 stripes instead of 28,
> >> one
> >> additional seek for every 29 reads is a whole lot better than one for
> >> every
> >> 4.
>
> > Firstly, hard-disks are quite good at stepping onto the next track in
> > the manner normally used for very large "contiguous" reads or writes.
>
> The best track-to-track seek time I've seen is 0.2ms for reads, 0.4ms for
> writes. That's phenomenal but can still add up.

It�s insignificant when reading or writing 1Mb at a time. Received on Fri Aug 29 2008 - 05:43:34 CEST

This message: [ Message body ]
Next message: Brian Selzer: "Re: sequential disk read speed"
Previous message: paul c: "Re: books on database implementation"
Maybe in reply to: Darren: "sequential disk read speed"
In reply to Brian Selzer: "Re: sequential disk read speed"
Next in thread: Brian Selzer: "Re: sequential disk read speed"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

Original text of this message