Re: Implementing a large partitioned IOT?

From: Mark J. Bobak <mark_at_bobak.net>
Date: Fri, 08 Feb 2002 06:21:41 GMT
Message-ID: <pan.2002.02.08.01.21.30.700497.13851@bobak.net>

D'oh. I should have mentioned, this is 8.1.7 EE on Solaris.
On Fri, 08 Feb 2002 00:34:40 -0500, Mark J. Bobak wrote:

> Hi,
>
> We have a new project that we are working on which will be a data
> warehouse of historical census data.
>
> Queries will be ad-hoc, user generated SQL. Things like: How many
> people w/ surname 'Smith' lived in Michigan in 1870 and were between 25
> and 35 years old? There are 8 different columns the user may query on.
> In the long run, this raw data will be around 225 million rows.
>
> Now, to me, this screams bitmap indexes. Controlled dataloads, no DML
> at all once data is loaded, lots of user generated unpredictable
> combination of queries, etc. In my view, the Oracle documentation could
> say something like "application of bitmap indexes: see Mark's database".
>
> Unfortunately, I just got wind of this project, and was not involved in
> the beginning. So, the decision was made that the user must have a
> response time less than 10 seconds, no matter what the query. (Yes,
> it's a ridiculous business requirement cooked up by someone witout a
> clue, but, here we are.)
>
> So, others involved in the project said, gee, we can use the CUBE()
> function to generate all the possible combinations and store them in an
> IOT. Then we can do single row lookups no matter what the query is.
> Sigh... So, now there is a movement afoot to rush out and create a
> range partitioned, hash sub-partitioned, IOT that will likely end up in
> the neighborhood of 34 billion rows and at least 2TB of data.
>
> So, I'm trying to get in front of this and convince people that w/
> bitmap indexes, the 220 million rows of raw data, and about 100 GB of
> disk, I can get them at least 90% of the way to their requirements. I've
> created a small benchark, on a subset of the data, and I got excellent
> results. This is my first real experience w/ bitmap indexes, and I was
> amazed at how efficient they are. But, surprise, surprise, they can't
> beat a single row access by rowid.
>
> What I'm looking for here is:
> - Are we going to be in trouble w/ the IOT?
> From a managabability, maintainability and recoverability point of view,
> does anyone see any gotchas or show stoppers?
>
> - How about performance scalability? Yes, B*Trees
> are efficient, but what happens when you have this volume of data? The
> index will be local prefixed, right? So, I'm hoping that index root
> blocks and first level branch blocks will not get excessively hot, since
> there will be one root per partition.
>
> Since everyone is sold on the IOT concept, the attitude is "you can't
> guarantee that I'll get the performance I need from bitmaps, so I'm
> going with an IOT. It's safer."
>
> So, is it REALLY safer? Is it insane to try this on this scale?
> PArticularly when you talk about partitioned IOT, with range paritions
> AND hash sub- partitions, I wonder if we are going to get into some of
> the less well tested cases.
>
> Anything anyone can offer to help me make the case for bitmap indexes
> will be greatly appreciated.
>
> Really, anything, pro or con, bitmap vs. IOT, would be appreciated.
>
> Thanks for any input you can offer,
>
> -Mark
Received on Fri Feb 08 2002 - 00:21:41 CST