Re: "Re-sequencing" a pseudo-key (another solution)

From: Kim Berg Hansen <kibeha_at_gmail.com>
Date: Fri, 17 Oct 2014 10:35:18 +0200
Message-ID: <CA+S=qd3uFTQtLbbNk4-Q2xj07fF-DPVC23eXxv47uG0m95UA_A_at_mail.gmail.com>

Hi, Iggy

Thanks for yet another idea - I think it could work. A somewhat similar idea (inspired by you and Sergey) for processing "in chunks" has been brewing in my head for a couple of days...

A few points I might not have made clear in my original question:

Point 1) RECID is never changed by the application. It's populated at insert and then never updated.

Point 2) RECID is only *partly* used to state "order of insertion". It is never used directly to mean anything chronologically, but in some cases tables will have a date and time and then if there can be multiple rows with same date/time, queries will always be ORDER BY DATETIME, RECID. We know sequences can never ensure that this will be in "order of insertion" (particularly since our kernel does this "application caches 10 sequence numbers" thing) but it is "good enough" for those few cases where there are multiple rows within same second. So keeping the RECID order during resequencing is absolutely very much preferred but not *entirely* an absolute necessity.

Point 3) Even though the application uses only *one* master sequence for populating RECIDs across all the tables, there is actually no absolute need that the RECID is unique across the entire application, it is OK if it is just unique within each table. Any referential is done by the application in one of two ways - either the RefRecId ("foreign key") is a Ref to a specific parent table, or the child table will have a two-column "multi-parent foreign key" of (RefFileId, RefRecId) where RefFileId is a number signifying which is the parent table.

Point 3 means that I might have at the moment a situation like this (using your example with the RECID range being -100 to +100 rather than -2 billion to +2 billion):

Tab1 has 9 rows with RECIDs (chronologically listed): 1, 4, 5, 8, 10, 95, -91, -90, -85
Tab2 has 11 rows with RECIDs (chronologically listed): 2, 6, 7, 12, 96, 98, 100, -100, -98, -97, -86
RECIDs unique in the application - master sequence NEXTVAL will be -84.

After "re-sequencing" it is OK if the situation looks like this:

Tab1 has 9 rows with RECIDs: -100, -99, -98, -97, -96, -95, -94, -93, -92 Tab1 has 11 rows with RECIDs: -100, -99, -98, -97, -96, -95, -94, -93, -92, -91, -90
RECIDs only unique within the tables - master sequence NEXTVAL will be -89 (future rows will then have "unique acress application" RECIDs, that can't be helped.)

But I got to think, actually I can also live with re-sequencing giving me this result instead:

Tab1 has 9 rows with RECIDs: 1, 2, 3, 4, 5, 6, 7, 8, 9 Tab1 has 11 rows with RECIDs:1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 Master sequence NEXTVAL will be +12.

If I re-sequence starting everything at -2 billion, then in some years when we get near zero I will have to make the sequence skip zero and go to +1 before going onwards again.
If I re-sequence starting everything at +1, then in some years when we get near +2 billion I will have to make the sequence restart at -2 billion (just like we already have done once.)
(It is mostly a matter of whether I think of the sequence sort of as a signed or unsigned 32-bit integer.)

No matter which I choose, I will have to take some action when we're halfway through the 32 bits.
Going from +2 billion to -2 billion has only one bad sideeffect - it violates point 2) above concerning ordering. But as that only would be a problem within a very short timeframe, in reality it is not a concern. We have done it before with no ill effects.

If I start my resequencing at 1, I can operate in chunks to do something like:

Original:
Tab1 has 9 rows with RECIDs: 1, 4, 5, 8, 10, 95, -91, -90, -85 Tab2 has 11 rows with RECIDs: 2, 6, 7, 12, 96, 98, 100, -100, -98, -97, -86

Update orig RECID chunk 1 -> 10 - result: Tab1 has 9 rows with RECIDs: *1, 2, 3, 4, 5*, 95, -91, -90, -85 Tab2 has 11 rows with RECIDs: *1, 2, 3*, 12, 96, 98, 100, -100, -98, -97, -86

*Someone adds new data*
Update orig RECID chunk 11 -> 20 - result: Tab1 has 10 rows with RECIDs: 1, 2, 3, 4, 5, 95, -91, -90, -85, -84 Tab2 has 12 rows with RECIDs: 1, 2, 3, *4*, 96, 98, 100, -100, -98, -97, -86, -83

Update orig RECID chunk 21 -> 30...
...
...
Update orig RECID chunk 91 -> 100 - result: Tab1 has 10 rows with RECIDs: 1, 2, 3, 4, 5, *6*, -91, -90, -85, -84 Tab2 has 12 rows with RECIDs: 1, 2, 3, 4, *5, 6, 7*, -100, -98, -97, -86, -83

*Someone adds new data*
Update orig RECID chunk -100 -> -91 - result: Tab1 has 11 rows with RECIDs: 1, 2, 3, 4, 5, 6, *7, 8*, -85, -84, -80 Tab2 has 13 rows with RECIDs: 1, 2, 3, 4, 5, 6, 7, *8, 9, 10*, -86, -83, -79

Update orig RECID chunk -90 -> -81 - result: Tab1 has 11 rows with RECIDs: 1, 2, 3, 4, 5, 6, 7, 8, *9, 10, 11* Tab2 has 13 rows with RECIDs: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, *11, 12*, -79

*Downtime* - suspend all activity on the database Update orig RECID "chunk" -80 -> -1 (the remainder) - result: Tab1 has 11 rows with RECIDs: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 Tab2 has 13 rows with RECIDs: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, *13*

Reset master sequence to start at *14*
*Let the users in*

Each "chunk" update could work something like your idea, Iggy.

Populate translation table with old and new recid for each table.
Update an inline view joining translation table and selecting WHERE RECID BETWEEN chunkstart AND chunkend.
Do the same for any refering child tables.

Any thoughts on that? It is very similar to your idea, I just work the updates chronologically from ancient history up to present time to avoid collisions, rather than your way of going backwards in time. But I can only do that because I accept starting everything at +1 again and in the future do wraparound once again when we reach +2 billion.

How will the indexes react to this?
While updating all the *positive* RECIDs, the rows actually pretty much should stay in the same place in the index, right? But when I start on the *negative* RECIDs, all updates will move rows from left side of index to right side, right? It won't contend with inserts from the application during the chunk updates, though, as the application inserts in the middle of index at present.

Doing chunkwise updating like this naturally will mean that the rows updated in each chunk will make it to the materialized view logs and be replicated to target database.
But the idea being (like both your ideas too, Iggy) that this is acceptable as it can be done in smaller chunks over a large period of time. And it is also relatively "simple" as it is "just plain updates" without any complex "trickery" (I am a fan of KISS when possible ;-)

Regards

Kim Berg Hansen

http://dspsd.blogspot.com
kibeha_at_gmail.com
_at_kibeha

On Thu, Oct 16, 2014 at 9:45 PM, Iggy Fernandez <iggy_fernandez_at_hotmail.com> wrote:

> re: update an inline view formed by restricting the table to RECIDs -97
> and -96 and joining with the translation table
>
> I meant to say update an inline view formed by restricting the table to
> RECIDs -97 and -96 and joining the table to its corresponding translation
> table (only one join required).
>
> ------------------------------
> From: iggy_fernandez_at_hotmail.com
> To: kibeha_at_gmail.com
> CC: oracle-l_at_freelists.org
> Subject: RE: "Re-sequencing" a pseudo-key (another solution)
> Date: Thu, 16 Oct 2014 12:27:28 -0700
>
>
> Hi, Kim,
>
> I have another solution for you. I'll explain it with an example
> consisting of 10 rows. Suppose that the list of RECIDs (union of all 300
> tables) is as follows: +1, +2, +3, +4, +100, -100, -99, -98, -97, -96. In
> this example, the MAXVALUE was 100. Wraparound occurred at 100 and the
> values are creeping back to 1. We have to take action before the values
> reach 1.
>
> The only assumption that is needed for the following solution to work is
> that the application will not change RECIDs while the resequencing is in
> progress:
>
> *Step 1*
>
> - Identify the last RECID value that is currently in use (the answer
> is -96). We will update RECIDs to this point, fully aware that new RECIDs
> are being added all the time.
> - Count the number of RECIDs upto -96 (the answer is 10)
> - Therefore we will map the RECIDs to the following list: +96, +97,
> +98, +99, +100, -100, -99, -98, -97, -96.
> - Create a master translation table with the following tuples:
>
> (1, -96, -96)
> (2, -97, -97)
> (3, -98, -98)
> (4, -99, -99)
> (5, -100, -100)
> (6, +100, +100)
> (7, +4, +99)
> (8, +3, +98)
> (9, +2, +97)
> (10, +1, +96)
> The first column is a permanent row number for convenience. Notice that
> the latest RECID has the smallest row number; that is, the row numbers are
> assigned in the opposite order of when they were created in the database.
> The reason will become clear in Step 3.
>
> *Step 2*
>
> - Prepare separate translation tables for each of the 300 tables in
> order avoid the run-time expense of master translation table lookups.
> - The columns of each translation table are as follows: (PK_RECID,
> PK_TRANSLATION, FK1_TRANSLATION, FK2_TRANSLATION, ...)
> - If you like, the translation tables could all be external tables to
> avoid creating redo. Ditto for the master translation table in Step 1.
>
> *Step 3*
>
> - Divide the work into manageable subsets. For this explanation,
> let's do only two RECIDs at a time.
> - *The trick is to work backwards in time to avoid RECID collisions;
> that is, to avoid the possibility that a translated RECID conflicts with an
> existing RECID*. That's why we assigned row numbers in the opposite
> order in which RECIDs were created. Note that we are working backwards in
> *time* not by mathematical value. Therefore we process the tuple (1,
> -96, -96) first and the tuple (2, -97, -97) second.
> - Write a transaction as follows:
>
> begin
>
> defer constraints on all 300 tables;
>
> for each table
>
> update an inline view formed by restricting the table to RECIDs -97
> and -96 and joining with the translation table;
>
> end;
>
> commit;
>
> end;
>
>
> - Working backwards in time, process additional chunks of RECIDs,
> until you are done.
>
>
> Will this work?
>
> Iggy
>
>
>

--
http://www.freelists.org/webpage/oracle-l

Received on Fri Oct 17 2014 - 10:35:18 CEST

This message: [ Message body ]
Next message: Kenny Payton: "Re: _ASM_ALLOW_ONLY_RAW_DISKS"
Previous message: Seth Miller: "Re: _ASM_ALLOW_ONLY_RAW_DISKS"
In reply to: Iggy Fernandez: "RE: "Re-sequencing" a pseudo-key (another solution)"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

Original text of this message