Re: Implementation of Grid File

From: Benjamin Johnston <superhero_at_benjaminjohnston.com.au>
Date: Fri, 3 Jan 2003 16:57:13 +1000
Message-ID: <XmaR9.15537$jM5.43404_at_newsfeeds.bigpond.com>


> Hi, I have a grid that is 4 columns wide and n row's deep. I need it to be
> very efficient, as they're could be millions of rows, so secondary storage
> efficiency is needed.
>
> We may have 1 or 2 value's to query for a row, for example we could have D
&
> C and would need to find the row that matches, and more than 1 row could
> match the criteria.To my knowledge, a grid file data structure provides
this
> function.

It does not really sound like a grid is what you want. You have four columns, so your dataset is basically 4-dimensional point data. A grid file is designed for 2-dimensional data.

10,000,000 rows, 4 bytes per cell, 4 columns comes to only 152MB of storage. You could probably fit it all in memory if you have a dedicated machine.

If you want to use secondary storage, could you could create a B-tree based index on each of the possible critera: A, B, C, D, AB, AC, AD, BC, BD, CD?

It may be slightly inefficient inserting into all 10 indexes, but I expect that you'd get near-optimal query times.

Also, based on your example it looks like you've got a restricted domain -- if the possible range is 0-256 (or something like that), then you could probably get away with storing the answer to every single-column query in advance, and computing the intersection (in-memory) of the two single-column queries when two columns are involved. This wouldn't use up much disk space, and would probably be quite fast.

Anyway, just some ideas...

-Benjamin Johnston Received on Fri Jan 03 2003 - 07:57:13 CET

Original text of this message