indexes vs sorted files in range and equality search

From: <>
Date: Thu, 6 Dec 2007 07:47:37 -0800 (PST)
Message-ID: <>

hi all, i got a question about the use of indexes and sorted files in range and equality searches. like you'll sure know
B:  number of data pages
R:  number of records per page
D:  (average) time to read or write disk page

and let's suppose C (data matching time) and H (hashing time) as unimportant.

theory shows us those simple values (let's take F>2):

sorted files: equality search: Dlog2(B) / range search: Dlog2B + matched pages

unclustered tree index: equality search: D(1+logF(1.5B)) / range search: DlogF(1.5B) + matched *records*

clustered tree index: equality search: DogF(1.5B) / range search: DlogF(1.5B) + matched pages

hash index: equality search: 2D / range search: BD

so, i'd say, about a range search: clustered tree indexes are the best, then we have sorted files, then unclustered tree indexes (although they got a faster search in the tree because of the logF performance over the sorted log2 one, they require many I/O operations on different pages, whose time cost overwhelms the such logF benefits). the worst are hash indexes, totally useless for ranged searches.

about an equality search, hash indexes are the best. then, we have clustered tree indexes (*STRICTLY* better than sorted files *only* because the logF trend over log2 one?), sorted files and unclustered tree indexes, being the worst because if an equality search has many relevant records, then its analysis becomes similar to the range search one, which shows that many I/O operations (one for record) for such index are required, since records could never be on the same page.

is my analisys basically cannot-deny-that-so-oh-my-god-accomplished- correct? :) Received on Thu Dec 06 2007 - 16:47:37 CET

Original text of this message