Re: Implementation of Grid File
Date: Fri, 3 Jan 2003 16:57:13 +1000
Message-ID: <XmaR9.15537$jM5.43404_at_newsfeeds.bigpond.com>
> Hi, I have a grid that is 4 columns wide and n row's deep. I need it to be
> very efficient, as they're could be millions of rows, so secondary storage
> efficiency is needed.
>
> We may have 1 or 2 value's to query for a row, for example we could have D
&
> C and would need to find the row that matches, and more than 1 row could
> match the criteria.To my knowledge, a grid file data structure provides
this
> function.
It does not really sound like a grid is what you want. You have four
columns, so your dataset is basically 4-dimensional point data. A grid file
is designed for 2-dimensional data.
10,000,000 rows, 4 bytes per cell, 4 columns comes to only 152MB of storage.
You could probably fit it all in memory if you have a dedicated machine.
If you want to use secondary storage, could you could create a B-tree based
index on each of the possible critera: A, B, C, D, AB, AC, AD, BC, BD, CD?
It may be slightly inefficient inserting into all 10 indexes, but I expect
that you'd get near-optimal query times.
Also, based on your example it looks like you've got a restricted domain --
if the possible range is 0-256 (or something like that), then you could
probably get away with storing the answer to every single-column query in
advance, and computing the intersection (in-memory) of the two single-column
queries when two columns are involved. This wouldn't use up much disk space,
and would probably be quite fast.
Anyway, just some ideas...
-Benjamin Johnston
Received on Fri Jan 03 2003 - 07:57:13 CET