Re: Implementation of Grid File

From: John <"John">
Date: Fri, 3 Jan 2003 12:52:14 -0000
Message-ID: <av4144$hlk$1_at_newsg3.svr.pol.co.uk>


Benjamin.

unfortunately the values within the grid are not within a set domain, so, they could have any integer value. The integer values are lookup values for another data structure. The application really needs to secondary storage efficient, but caching will be using at a higher level of abstraction. I was considering B-Tree's however, that would require 24 in total (4*3*2*1), as any mix of columns could be used for the seed of the query (C&D, B&C).

The reason I was considering the implementation of Grid file is that it has two disc access, which would be efficient for the application.

So what about a happy median? a multi-dimensional data structure with efficient disk access? Could R-Tree be used in such a way

Regards, John

"Benjamin Johnston" <superhero_at_benjaminjohnston.com.au> wrote in message news:XmaR9.15537$jM5.43404_at_newsfeeds.bigpond.com...
>
> > Hi, I have a grid that is 4 columns wide and n row's deep. I need it to
be
> > very efficient, as they're could be millions of rows, so secondary
storage
> > efficiency is needed.
> >
> > We may have 1 or 2 value's to query for a row, for example we could have
D
> &
> > C and would need to find the row that matches, and more than 1 row could
> > match the criteria.To my knowledge, a grid file data structure provides
> this
> > function.
>
> It does not really sound like a grid is what you want. You have four
> columns, so your dataset is basically 4-dimensional point data. A grid
file
> is designed for 2-dimensional data.
>
> 10,000,000 rows, 4 bytes per cell, 4 columns comes to only 152MB of
storage.
> You could probably fit it all in memory if you have a dedicated machine.
>
> If you want to use secondary storage, could you could create a B-tree
based
> index on each of the possible critera: A, B, C, D, AB, AC, AD, BC, BD, CD?
>
>
> It may be slightly inefficient inserting into all 10 indexes, but I expect
> that you'd get near-optimal query times.
>
> Also, based on your example it looks like you've got a restricted
domain --
> if the possible range is 0-256 (or something like that), then you could
> probably get away with storing the answer to every single-column query in
> advance, and computing the intersection (in-memory) of the two
single-column
> queries when two columns are involved. This wouldn't use up much disk
space,
> and would probably be quite fast.
>
> Anyway, just some ideas...
>
> -Benjamin Johnston
>
>
>
Received on Fri Jan 03 2003 - 13:52:14 CET

Original text of this message