Re: VARRAYS for VLDB

From: Jonathan Lewis <jonathan_at_jlcomp.demon.co.uk>
Date: Fri, 25 Apr 2003 09:32:07 +0100
Message-ID: <b8arnm$in6$1$8302bc10@news.demon.co.uk>

I think secondary indexes on IOTs
need some careful testing. It's probably not an area that many people have used
in a high-stress environment.

There are two main issues:

The primary key is used in the secondary index instead of a rowid so for large primary keys, the index would be bigger than the equivalent index on a simple table.
The secondary index holds a 'guess' block address for the row it points to so that a query can go to the right block more cheaply. But if the row has moved (e.g. leaf block split) then the guess is wrong and is a cost, not a benefit. But this won't be a problem if your application is always adding data at the 'right-hand' edge of the index.

Depending of version, there are various
features and limitations on what you can do with secondary indexes that you will
have to trade, balance and test, if you go down that path.

(And yes, the key compression could well have a very similar benefit to the varray idea, whilst avoiding the overhead of 'object unpickling')

--
Regards

Jonathan Lewis
http://www.jlcomp.demon.co.uk

  The educated person is not the person
  who can answer the questions, but the
  person who can question the answers -- T. Schick Jr


One-day tutorials:
http://www.jlcomp.demon.co.uk/tutorial.html

____UK_______April 22nd
____USA_(FL)_May 2nd
____Denmark__May 21-23rd
____Sweden___June
____Finland__September
____Norway___September

Three-day seminar:
see http://www.jlcomp.demon.co.uk/seminar.html
____UK_(Manchester)_May x 2
____Estonia___June (provisional)
____Australia_June (provisional)
____USA_(CA, TX)_August

The Co-operative Oracle Users' FAQ
http://www.jlcomp.demon.co.uk/faq/ind_faq.html


"Karsten Schmidt" <groups_at_karsten-schmidt.com> wrote in message
news:c6711ac4.0304240012.11edb6c7_at_posting.google.com...


> Thanks Jonathan,

>  There are at least two independend access paths to that table,

>  i.e. I need at least one index apart from the primary key.

>

>  Doing some research on secondary indexes in IOT's i came to the

> conclusion

>  that it might not be the smartest idea to have a secondary index on

a


> IOT of significant size. Any insight on that ?

>

>  I suppose, storage wise, there should not be much difference

between


> a key-compressed IOT and my VARRAY idea - all the dupes get factored

> out.

>

> Regards Karsten

>

>

> "Jonathan Lewis" <jonathan_at_jlcomp.demon.co.uk> wrote in message

news:<b86d73$mr7$2$8302bc10_at_news.demon.co.uk>...


> > Mixing VARRAYs with heavy duty processing

> > doesn't sound like a good idea.

> >

> > It depends very much on exactly how you want

> > to use the data, but your best bet could be

> > a range-partitioned IOT, partitioned by time,

> >  with compressed primary key - and a materialized view,

> > with the same partitioning and on a prebuilt table,

> > for the summary figures.

> >

> > The order of the columns in the IOT PK will

> > be dictated by the nature of the most important

> > queries.

> >

> >

> > --

> > Regards

> >

> > Jonathan Lewis

> > http://www.jlcomp.demon.co.uk

> >

> >   The educated person is not the person

> >   who can answer the questions, but the

> >   person who can question the answers -- T. Schick Jr

> >

> >

> > One-day tutorials:

> > http://www.jlcomp.demon.co.uk/tutorial.html

> >

> > ____UK_______April 22nd

> > ____USA_(FL)_May 2nd

> > ____Denmark__May 21-23rd

> > ____Sweden___June

> > ____Finland__September

> > ____Norway___September

> >

> > Three-day seminar:

> > see http://www.jlcomp.demon.co.uk/seminar.html

> > ____UK_(Manchester)_May x 2

> > ____Estonia___June (provisional)

> > ____Australia_June (provisional)

> > ____USA_(CA, TX)_August

> >

> > The Co-operative Oracle Users' FAQ

> > http://www.jlcomp.demon.co.uk/faq/ind_faq.html

> >

> >

> > "Karsten Schmidt" <groups_at_karsten-schmidt.com> wrote in message

> > news:c6711ac4.0304230403.4d3f19c1_at_posting.google.com...

> > > Hi,

> > >  I am about to decide on the physical datamodel for a rather

large


> > > database,

> > >  I would appreciate some feedback before it is all messed up.

> > >

> > >  There is a number of measures, that are sampled with a certain

> > > frequency.

> > >  i.e. temperature in centigrades @15 samples per second. I get

> >  several

> > > hundred/ thousand of these streams, and the data is to be kept

for


> >  at

> > > least three months.

> > >

> > >  The interesting part is the table that stores the samples:

> > >  Partitioned by timestamp, subpartitioned by sample type, or

> >  something

> > > like that.

> > >  This might even become a index-organized table. Not sure about

that


> > > yet, I will see how the prototype works out.

> > >  This is to hold several billion rows.

> > >

> > >  To cut down on the overhead for primary key values etc., I was

> > > thinking about using a varray to hold the actual sample values -

so


> >  i

> > > can get maybe 50 samples per row in that table.

> > >

> > >  Then in the table, i can keep several statistics such as number

of


> > > samples, average, etc.

> > >

> > >  The whole intend is to get the storage requirements into a

> >  reasonable

> > > range (i.e. less than a terabyte) for that table.

> > >

> > >  To get the logical view back, I would create a view that rolls

out


> > > that varray,

> > >  so the application sees indeed one 'pseudo-' row per sample.

> > >

> > >  Is this reasonable, or is there any showstopper with varrays

that


> >  you

> > > know about ?

> > >

> > > Thanks for your help

> > >  Karsten

Received on Fri Apr 25 2003 - 03:32:07 CDT