"Re-sequencing" a pseudo-key

From: Kim Berg Hansen <kibeha_at_gmail.com>
Date: Wed, 8 Oct 2014 12:08:24 +0200
Message-ID: <CA+S=qd3x3=rYrd2__zNS+GW6RXYNNqhjmk3XU-_FbxGJw=-Wew_at_mail.gmail.com>

Hi, list

The short description of my question is:

· I’ve got 600 million rows in 650 tables and need to update all rows to “re-sequence” a 32-bit integer pseudo key in all tables. How to approach this update task with reasonable efficiency?

I can’t think of a way to describe the task very briefly without getting a lot of answers where I’ll have to say “well, there’s just this little special issue that makes your good idea unable to work here”, so this’ll be a longish mail. If you haven’t got the time or inclination to read through this bunch of details, please feel free to stop reading now :-)

For those reading on, I’ll try to describe the setup and the challenge in a bunch of points. That way you can refer to “point 5.c.2 is wrong, young man” or something like that ;-)

Total about 650 tables totaling about 600 million rows
1 table about 300 million rows
7 tables each in 10-40 million rows range totaling about 100 million rows
37 tables each in 1-9 million rows range totaling about 100 million rows
The rest 600+ tables each less than 1 million rows totaling about 100 million rows
Each table has numeric pseudokey called RECID
Legacy application code mandates this to be 32-bit integer
-2,147,483,648 to 2,147,483,647
Populated by one common sequence created with INCREMENT BY 10
Legacy kernel (unchangeable) code works like:
When needing to insert a row in any table, it selects NEXTVAL FROM DUAL

                                                             ii.      It

inserts row using the selected sequence value

                                                            iii.      The

next 9 inserts from this session to any table will use the next 9 values

                                                           iv.      The 10th

insert will again select NEXTVAL FROM DUAL

d. Sequence originally started at 1 going up towards 2,147,483,647

e. Few years ago max was reached and sequence reset to start at
-2,147,483,648

f. We are now at about -340,000,000 and will reach zero late 2015 or early 2016

g. “Re-sequencing” pseudokeys will “use” 600 million of the 4200 million

h. Then we would have 3600 million available or about 10-12 years before next problem

i. Time to research solution is now

                                                               i.      If

this cannot be done in one night, Easter 2015 is the only big window

3 Key constraints are only partially defined in Oracle

Legacy application treats Oracle as black box and handles keys in application
So no primary or foreign keys in Oracle – only defined in application metadata
Except some tables are replicated with materialized views
These tables have materialized view logs

                                                             ii.      Therefore

they have primary key in Oracle defined as RECID column

d. Many places in application have “multi-parent foreign key”

                                                               i.      One

column designates which parent table

                                                             ii.      Other

column contains parent RECID

e. Some “generic” tables exist in the application

                                                               i.      For

example if COL2=17, then COL5 contains parent RECID of one table

                                                             ii.      If

COL2=18, then COL7 contains parent RECID of another table

4. Legacy application is index happy

Average almost 4 indexes per table
Up to 22 indexes on a few tables
The one big table of 300 million rows has 16 indexes
Every table has a unique index on RECID
Indexes are all created as UNIQUE

i.

Application
makes non-unique indexes UNIQUE by adding RECID as last element

                                                             ii.      If it

is truly unique in the application, RECID is not in the index

5. IDEA 1 – DML

Massive DML for each table
CREATE TABLE tab1$newrecid AS SELECT recid as oldrecid, rownum as newrecid FROM tab1 ORDER BY recid

                                                             ii.      UPDATE

tab1 SET recid = newrecid

                                                            iii.      UPDATE

tables with FKs to tab1 SET parentrecid = newrecid

b. Pros

                                                               i.      Simple

and easy

c. Cons

                                                               i.      Slow

DML of all rows

                                                             ii.      Index

maintenance

                                                            iii.

Replication

All rows go in the mview logs
FAST REFRESH will be very slow based on the bloated mview logs
IDEA 2 – DDL
A lot of CTAS
For all tables create translate table CREATE TABLE tab1$newrecid AS SELECT recid as oldrecid, rownum as newrecid FROM tab1 ORDER BY recid

                                                             ii.      For

each table do CTAS joining PK and any FKs to translate tables

CREATE TABLE tab1$copy AS SELECT t2.newrecid as recid, t1.othercolumns, t15.newrecid as t15parentrecid, t17… FROM tab1 t1 JOIN tab1$newrecid t2 ON t2.oldrecid = t1.recid JOIN tab15$newrecid t15 ON t15.oldrecid = t1.t15parentrecid JOIN tab17$newrecid …
DROP tab1
RENAME tab1$copy TO tab1
Rebuild indexes
Create mview log if table is replicated
Recreate / refresh complete replicated mview on target database
Pros
Probably a good deal faster than DML
Cons
Extra storage needed

                                                             ii.      Extra

work rebuilding indexes

                                                            iii.      Extra

work rebuilding replication

                                                           iv.      More

complex code to write

                                                             v.      Downtime

on replication target database

7. Some of my thoughts

Normally I can have a night service window maybe 3 hours
For this I can arrange up to 10-12 hours if need be
If that won’t be sufficient, Easter can provide 60 hour window
My priority is “safety” – as little chance of something going wrong as possible
I can arrange the storage needed for the copy tables of the DDL method
I think the DDL method is most likely “best”, but can it be “finetuned”?
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
Running on Oracle Linux Server x86_64 release 6.5 with 4 cores and 512 GB memory
I can test this full scale on a dataguard copy that I can put in snapshot standby mode
My questions
Have I missed something obvious?
Ideas for a different method?
Ideas for improving some of the steps in my ideas?
Would it help if the “translate” tables were IOTs?

                                                             ii.      Or

give the “translate” tables a tablespace with bigger blocks?

d. Things to consider that can give considerable speed improvements?

                                                               i.      (I

mean, adding complexity to gain a few minutes isn’t interesting ;-)

e. Any database parameters that should be increased while doing this?

f. Parallelism? DIY with 4 sessions each handling a different set of tables?

So, to any that got this far – thank you for your patience in reading this :-)

I hope I can either get confirmation that my ideas seem reasonably OK or some ideas for improvement I can try out.

Thanks in advance.

Regards

Kim Berg Hansen

http://dspsd.blogspot.com
kibeha_at_gmail.com
_at_kibeha

--

http://www.freelists.org/webpage/oracle-l Received on Wed Oct 08 2014 - 12:08:24 CEST

This message: [ Message body ]
Next message: Dave Morgan: "Handling DBMS_CRYPTO_KEYS"
Previous message: Hilton Meyer: "RE: Swapping Issues and ora_dbw"
Next in thread: Iggy Fernandez: "RE: "Re-sequencing" a pseudo-key"
Maybe reply: Iggy Fernandez: "RE: "Re-sequencing" a pseudo-key"
Maybe reply: Iggy Fernandez: "RE: "Re-sequencing" a pseudo-key"
Maybe reply: Hans Forbrich: "Re: "Re-sequencing" a pseudo-key"
Maybe reply: De DBA: "Re: "Re-sequencing" a pseudo-key"
Maybe reply: Kim Berg Hansen: "Re: "Re-sequencing" a pseudo-key"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

Original text of this message