Re: How to optimize for this?

From: Алексей Резанов <gdetozdes_at_gdetotam.com>
Date: Sun, 14 May 2006 00:24:04 GMT
Message-ID: <EMu9g.11894$Qq.955@clgrps12>

Hello, chrilleboy76_at_hotmail.com!
You wrote on 12 May 2006 00:37:35 -0700:

Not sure I'm fully groking the problem, but here is a few ideas anyway :)

Are you using local indexes? From your post it sounds like you're using the global ones. No wonder they get invalidated after each load.
Are you re-estmating statistics after loads - and if you do: is "size" parameter is set to a sufficient value? I'm referring to dbms_stats.gather_table_stats procedure. Setting "size" parameter will create histograms which shall help to overcome the "odd values" issue. If it doesn't work, a function-based index might help: index just the odd values.

Regards, Alexei

c> The scenario is this:

 c> We're inserting around 250millions rows per day, into some 15 different
 c> tables depending on recordtype. This means somewhere between 5-25
 c> millions rows/table/day. This data is kept between 7-60 days and then
 c> deleted.

c> To read this amount we use direct path load which means the indexes are c> unusable after a load and have to be recreated.

 c> Earlier we had the tables partitioned by day and created the index the
 c> night after but the problem is that the next day or maybe even after
 c> that some "late" records arrive with that date which meant the index
 c> goes unusable and need to rebuild and it takes a looong time.. too long
 c> time even so we needed a change.

 c> So, instead we now partition by loadid and optimize the partitionrange
 c> for each table to fit around  days records into each partition. The
 c> good part is that we now can close a partition for good and never have
 c> to open it again (the data is never modified, only searched in). The
 c> bad part is that we lost the logical field (date) which is almost
 c> always used when searching the data. At loadtime we fill a helptable
 c> with info about what dates can be found within each jobid. This is then
 c> used when searching such as:
 c> select
 c>   field1,
 c>   field2
 c> from
 c>   mytable
 c> where

c> jobid in(1,2,3,4,5,..) and
c> date = to_date('2006-05-10','YYYY-MM-DD')

 c> Another bad thing is that in one job there can be some 500k rows with
 c> one date and a single row with another and when a user searches for
 c> this "odd" date oracle needs to fulltablescan a large partition just to
 c> find this single record. Creating an index on date doesnt feel worthy
 c> since 99,99% in each partition, or at least jobid, is the same.

 c> The queries against these tables are dynamically created and we have no
 c> idea beforehand which fields will be selected and which where criteria
 c> is being used. This also means that it's impossible to create a magical

c> index which solves everything since we have no idea what the users are c> searching on...

c> Does anyone else have some smart idea on this? Maybe some other c> partitioning or subpartitioning?

c> Partioning on jobid and subpartitioning on date? c> Multicolumn partitioning with jobid, date?

 c> In short: We need to close partitions fairly quick after reading and
 c> only rebuild index once and at the same time be able to search the data
 c> regarding some date.

Received on Sat May 13 2006 - 19:24:04 CDT