Re: Reorganize Big Table with BLOB column

From: Jonathan Lewis <jlewisoracle_at_gmail.com>
Date: Wed, 10 Mar 2021 13:45:59 +0000
Message-ID: <CAGtsp8=ckRyFQzPPyhkCt-+aNTkpAeqyUq-pfkEp1FKnZY4xNA_at_mail.gmail.com>



I wouldn't be certain that partition elimination won't work - you may be able to get the optimzer to do something a bit clever if you can define your year column suitably and add an appropriate constraint. The effect you're after would be something like:
check (timestamp >= year and timestamp < year + 1) Several years ago I found that I could get partition elimination on a virtual column that didn't appear in the query by doing something like this:
https://jonathanlewis.wordpress.com/2013/10/17/virtual-date-partitions/ However I've also written something about how a similar attempt failed a few years later:
https://jonathanlewis.wordpress.com/2016/05/23/virtual-partitions/

Given that you're working with the newest versions it's possible that something like this will do what you want.

Can you give me the exact syntax of your create index statement - previous experience is that context indexes are basically "chunked" anyway, so increasing the chunking by a factor of 11 may be pretty insignificant if it means you can re-organize individual partitions from time to time as the inherent chunking makes the index increasingly inefficient. It's possible, though, that the chunking no longer happens, so I'd want to experiment using the appropriate type of index before commenting further.

Have you checked v$process_memory for the process while the index is being built ?

If the context mechanisms have not evolved significantly then the strategy is:
read as many documents as possible into memory create a list of locations for each token (word) insert rows into tables to record tokens and location lists

It's occurred to me that maybe part of the problem of building the index is Oracle do internal restarts - which could mean wasting a lot of time on rolling back. Have you been taking snapshots of the session stats every few minutes to see what work it's doing and how that workload might change over time.

Question 1: that might work - it's worth testing the idea (on a small data set)

Question 2: I think there would be a non-linear increase in time in the early stages - but that's the same effect as that you would see if you created a b-tree index on an empty table then inserted rows in batches of (say) 1 million. Initially it would be fast because the index was maintained in memory, then there would be a non-linear slowdown as the buffer cache was exhausted and increasing number of index updates required disc reads, then the time would stabilise at the point where every index entry inserted required a disc read. (The analogy isn't perfect, but it's roughly the right idea). That's why it's important to investigate the memory parameter for the index build - maybe it's too big, maybe it's too small - and why LOCAL indexing would probably be better for the build than incrementally extending a global index.

Regards
Jonathan Lewis

On Wed, 10 Mar 2021 at 12:50, Lothar Flatz <l.flatz_at_bluewin.ch> wrote:

> Hi Jonathan,
>
>
> I am currently still in the process of finding the best way of moving the
> data. I believe I have now found the best storage characteristics for the
> BLOB. (DISABLE STORAGE IN ROW CHUNK 32K RETENTION NONE
> NOCACHE NOLOGGING ). Average length of the BLOB is 130886 byte.
> My main concern is the text index though. From previous experience I can
> say that it was almost impossible to context index a table of this size.
> That index create was simply never finishing.
>
> Questions:
>
> 1. I wonder if it would be possible to build a global context index by
> creating it per partition using partial global indexing. You would switch
> on indexing as per partition one at a time and rebuild the index. (It
> could rebuild the whole context index though..)
> 2. Do you think that the time building a context index can increase
> more that linear when the number of rows increases? You were mentioning
> memory. Does this suggest that memory is needed to build the context index
> in relation to it's size? That when memory is exhausted the text index
> creation stalls?
>
>

--
http://www.freelists.org/webpage/oracle-l
Received on Wed Mar 10 2021 - 14:45:59 CET

Original text of this message