How to optimize for this?

From: <chrilleboy76_at_hotmail.com>
Date: 12 May 2006 00:37:35 -0700
Message-ID: <1147419454.710317.12130@u72g2000cwu.googlegroups.com>

I'm in a project with some fairly large amount of data.

The scenario is this:

We're inserting around 250millions rows per day, into some 15 different tables depending on recordtype. This means somewhere between 5-25 millions rows/table/day. This data is kept between 7-60 days and then deleted.

To read this amount we use direct path load which means the indexes are unusable after a load and have to be recreated.

Earlier we had the tables partitioned by day and created the index the night after but the problem is that the next day or maybe even after that some "late" records arrive with that date which meant the index goes unusable and need to rebuild and it takes a looong time.. too long time even so we needed a change.

So, instead we now partition by loadid and optimize the partitionrange for each table to fit around ½ days records into each partition. The good part is that we now can close a partition for good and never have to open it again (the data is never modified, only searched in). The bad part is that we lost the logical field (date) which is almost always used when searching the data. At loadtime we fill a helptable with info about what dates can be found within each jobid. This is then used when searching such as:
select
  field1,
  field2
from
  mytable
where
jobid in(1,2,3,4,5,..) and
date = to_date('2006-05-10','YYYY-MM-DD')

Another bad thing is that in one job there can be some 500k rows with one date and a single row with another and when a user searches for this "odd" date oracle needs to fulltablescan a large partition just to find this single record. Creating an index on date doesnt feel worthy since 99,99% in each partition, or at least jobid, is the same.

The queries against these tables are dynamically created and we have no idea beforehand which fields will be selected and which where criteria is being used. This also means that it's impossible to create a magical index which solves everything since we have no idea what the users are searching on...

Does anyone else have some smart idea on this? Maybe some other partitioning or subpartitioning?

Partioning on jobid and subpartitioning on date? Multicolumn partitioning with jobid, date?

In short: We need to close partitions fairly quick after reading and only rebuild index once and at the same time be able to search the data regarding some date.

Any ideas are welcome! Received on Fri May 12 2006 - 02:37:35 CDT