Re: VARRAYS for VLDB

From: Karsten Schmidt <groups_at_karsten-schmidt.com>
Date: 28 Apr 2003 01:01:49 -0700
Message-ID: <c6711ac4.0304280001.298803f6@posting.google.com>

Thanks Jonathan ,
I am most likely going with Oracle 9.2.0.3.0 on HPUX 11.11, 64 bit.

sounds a little scary. I suppose, I really have to benchmark the two options.
should not be an issue for me, as you already figured, the inserts are always going to the right-hand side of the primay key index. Apart from that, there is still the option to re-build the partition once no new data is coming in. (create ... as select, exchange partition...) I am doing that successfully with other databases with partitioned IOT's.

Is there anything in particular that I should watch for when testing the VARRAY option ? - i.e. is it bogus exexution plans for queries, excessive cpu usage, memory issues etc. ? Most likely, the entire thing will be i/o bound, so at this point, I am not too concerned to spend a few cpu cycles on object un-nesting.

best regards
Karsten

"Jonathan Lewis" <jonathan_at_jlcomp.demon.co.uk> wrote in message news:<b8arnm$in6$1$8302bc10_at_news.demon.co.uk>...
> I think secondary indexes on IOTs
> need some careful testing. It's probably
> not an area that many people have used
> in a high-stress environment.
>
> There are two main issues:
> a) The primary key is used in the
> secondary index instead of a rowid
> so for large primary keys, the index
> would be bigger than the equivalent
> index on a simple table.
>
> b) The secondary index holds a
> 'guess' block address for the row it
> points to so that a query can go to
> the right block more cheaply. But if
> the row has moved (e.g. leaf block split)
> then the guess is wrong and is a cost,
> not a benefit. But this won't be a problem
> if your application is always adding data
> at the 'right-hand' edge of the index.
>
> Depending of version, there are various
> features and limitations on what you can
> do with secondary indexes that you will
> have to trade, balance and test, if you go
> down that path.
>
>
> (And yes, the key compression could well
> have a very similar benefit to the varray idea,
> whilst avoiding the overhead of 'object unpickling')
>
>
>
> --
> Regards
>
> Jonathan Lewis
> http://www.jlcomp.demon.co.uk
>
> The educated person is not the person
> who can answer the questions, but the
> person who can question the answers -- T. Schick Jr
>
>
> One-day tutorials:
> http://www.jlcomp.demon.co.uk/tutorial.html
>
> ____UK_______April 22nd
> ____USA_(FL)_May 2nd
> ____Denmark__May 21-23rd
> ____Sweden___June
> ____Finland__September
> ____Norway___September
>
> Three-day seminar:
> see http://www.jlcomp.demon.co.uk/seminar.html
> ____UK_(Manchester)_May x 2
> ____Estonia___June (provisional)
> ____Australia_June (provisional)
> ____USA_(CA, TX)_August
>
> The Co-operative Oracle Users' FAQ
> http://www.jlcomp.demon.co.uk/faq/ind_faq.html
>
>
> "Karsten Schmidt" <groups_at_karsten-schmidt.com> wrote in message
> news:c6711ac4.0304240012.11edb6c7_at_posting.google.com...
> > Thanks Jonathan,
> > There are at least two independend access paths to that table,
> > i.e. I need at least one index apart from the primary key.
> >
> > Doing some research on secondary indexes in IOT's i came to the
> > conclusion
> > that it might not be the smartest idea to have a secondary index on
> a
> > IOT of significant size. Any insight on that ?
> >
> > I suppose, storage wise, there should not be much difference
> between
> > a key-compressed IOT and my VARRAY idea - all the dupes get factored
> > out.
> >
> > Regards Karsten
> >
> >
> > "Jonathan Lewis" <jonathan_at_jlcomp.demon.co.uk> wrote in message
> news:<b86d73$mr7$2$8302bc10_at_news.demon.co.uk>...
> > > Mixing VARRAYs with heavy duty processing
> > > doesn't sound like a good idea.
> > >
> > > It depends very much on exactly how you want
> > > to use the data, but your best bet could be
> > > a range-partitioned IOT, partitioned by time,
> > > with compressed primary key - and a materialized view,
> > > with the same partitioning and on a prebuilt table,
> > > for the summary figures.
> > >
> > > The order of the columns in the IOT PK will
> > > be dictated by the nature of the most important
> > > queries.
> > >
> > >
> > > --
> > > Regards
> > >
> > > Jonathan Lewis
> > > http://www.jlcomp.demon.co.uk
> > >
> > > The educated person is not the person
> > > who can answer the questions, but the
> > > person who can question the answers -- T. Schick Jr
> > >
> > >
> > > One-day tutorials:
> > > http://www.jlcomp.demon.co.uk/tutorial.html
> > >
> > > ____UK_______April 22nd
> > > ____USA_(FL)_May 2nd
> > > ____Denmark__May 21-23rd
> > > ____Sweden___June
> > > ____Finland__September
> > > ____Norway___September
> > >
> > > Three-day seminar:
> > > see http://www.jlcomp.demon.co.uk/seminar.html
> > > ____UK_(Manchester)_May x 2
> > > ____Estonia___June (provisional)
> > > ____Australia_June (provisional)
> > > ____USA_(CA, TX)_August
> > >
> > > The Co-operative Oracle Users' FAQ
> > > http://www.jlcomp.demon.co.uk/faq/ind_faq.html
> > >
> > >
> > > "Karsten Schmidt" <groups_at_karsten-schmidt.com> wrote in message
> > > news:c6711ac4.0304230403.4d3f19c1_at_posting.google.com...
> > > > Hi,
> > > > I am about to decide on the physical datamodel for a rather
> large
> > > > database,
> > > > I would appreciate some feedback before it is all messed up.
> > > >
> > > > There is a number of measures, that are sampled with a certain
> > > > frequency.
> > > > i.e. temperature in centigrades @15 samples per second. I get
> several
> > > > hundred/ thousand of these streams, and the data is to be kept
> for
> at
> > > > least three months.
> > > >
> > > > The interesting part is the table that stores the samples:
> > > > Partitioned by timestamp, subpartitioned by sample type, or
> something
> > > > like that.
> > > > This might even become a index-organized table. Not sure about
> that
> > > > yet, I will see how the prototype works out.
> > > > This is to hold several billion rows.
> > > >
> > > > To cut down on the overhead for primary key values etc., I was
> > > > thinking about using a varray to hold the actual sample values -
> so
> i
> > > > can get maybe 50 samples per row in that table.
> > > >
> > > > Then in the table, i can keep several statistics such as number
> of
> > > > samples, average, etc.
> > > >
> > > > The whole intend is to get the storage requirements into a
> reasonable
> > > > range (i.e. less than a terabyte) for that table.
> > > >
> > > > To get the logical view back, I would create a view that rolls
> out
> > > > that varray,
> > > > so the application sees indeed one 'pseudo-' row per sample.
> > > >
> > > > Is this reasonable, or is there any showstopper with varrays
> that
> you
> > > > know about ?
> > > >
> > > > Thanks for your help
> > > > Karsten
Received on Mon Apr 28 2003 - 03:01:49 CDT