Re: [EXT] Re: Goldengate: How to load multiple files for initial load?

From: Chirag Shah <csshah85_at_gmail.com>
Date: Sat, 11 May 2019 00:53:23 +0200
Message-ID: <CAG4NG0EDOmk8oROfyFdEO6iJ-KOpjvm7PDbJ_0nh1Pb8sPuaLA_at_mail.gmail.com>



Hello Deniel,

It's good to hear that now the problem has been sorted out. I just would like to share our approach while do initial load of multi million (500+) rows of table from Oracle (12c) to MS SQL server (2016). I hope it might help you in case you would like to explore. Below is the high level steps..

  1. Create MATERIALIZED VIEW of the Table (On Source - Oracle DB). This is because we don't want to have live transaction while doing initial-load of very big table in our approach.
  2. Create Multiple initial load Extracts (9 - 60) - based on ROWID and based on table size. Well,we have automized it through PL/SQL procedure which creates GG parameter files for Extract and Replicates, scripts accordingly.
  3. Multiple initial-load Extracts files (with different extfile a0, a1, a2, a3 .. aa, ab..etc.), and accordingly multiple replicates..

table <SCHEMA>.<MV> SQLPREDICATE "where rowid between 'ABoUbxAAAAAKAEAAAA' and 'ABoUbxAAAAAMD//CcQ'";

  • Sample Query to Divide MV into 25 initial -load Extract table clause ----------------------- select grp,'table <SCHEMA>.<MV>, SQLPREDICATE "where rowid between '''|| dbms_rowid.rowid_create( 1, data_object_id, lo_fno, lo_block, 0 ) ||''' and '''|| dbms_rowid.rowid_create( 1, data_object_id, hi_fno, hi_block, 10000 ) ||'''";' from ( select distinct grp, first_value(relative_fno) over (partition by grp order by relative_fno, block_id rows between unbounded preceding and unbounded following) lo_fno, first_value(block_id ) over (partition by grp order by relative_fno, block_id rows between unbounded preceding and unbounded following) lo_block, last_value(relative_fno) over (partition by grp order by relative_fno, block_id rows between unbounded preceding and unbounded following) hi_fno, last_value(block_id+blocks-1) over (partition by grp order by relative_fno, block_id rows between unbounded preceding and unbounded following) hi_block, sum(blocks) over (partition by grp) sum_blocks from ( select relative_fno, block_id, blocks, trunc( (sum(blocks) over (order by relative_fno, block_id)-0.01) / (sum(blocks) over ()/25) ) grp from dba_extents where segment_name = upper('<MV>') and owner = '<SCHEMA>' order by block_id) ), (select data_object_id from dba_objects where object_name = upper('<MV>') and owner='<SCHEMA>' and object_type='TABLE') order by grp;

Br,
Chirag

On Fri, May 10, 2019 at 10:50 PM Livingston, Daniel <DLivingston_at_west.com> wrote:

> Plans are in progress to move to SS 2017 – we are supposed to be done by
> Jul 1 this year - give or take ... mostly "give" (SSIS package
> migrations...) J
>
>
>
> The problem turned out to be sort of like what was mentioned in earlier
> posts (yesterday): "...Just because you CAN, doesn't mean you SHOULD"
>
>
>
> A big thanks to Suresh for the assist.
>
>
>
> When I was creating the remote trail file I had been looking all over for
> examples, and nearly all of them had the trail file specified as something
> like "./dirdat/initld". Even though I knew that you can only have 2
> characters in "usual" trail files, I figure that since so many examples
> showed > 2 characters, then it must be OK for initial loads. Doing so, GG
> then appended the sequence number to the file and generate a series of
> files named like:
>
>
>
> ./dirdat/initld000000
>
> ./dirdat/initld000001
>
> ./dirdat/initld000002
>
> ...
>
>
>
> However, when I started replicat, with this in the param file:
>
>
>
> EXTFILE ./dirdat/initld
>
>
>
> It will say "cannot find file". Change it to "./dirdat/initld000000" and
> it will **only** read that single file – presumably since the prefix
> contains more than 2 characters.
>
>
>
> Changing the trail file name to "./dirdat/ld", and files were generated as
> before with a sequence number appended, but this time replicat started at
> sequence "000" and proceeded through all 57 files 1 by 1.
>
>
>
> I even renamed the previous files from "initld000000", etc. to
> "ld000000000" and it worked as expected as well. So all those examples I
> googled of course worked fine on the tiny little test tables they always
> have because everything fit into a single trail file
>
>
>
>
>
>
>
>
>
>
>
>
>
> *From:* Kellyn Pot'Vin-Gorman [mailto:dbakevlar_at_gmail.com]
> *Sent:* Friday, May 10, 2019 2:31 PM
> *To:* Andrew Kerber <andrew.kerber_at_gmail.com>
> *Cc:* Livingston, Daniel <DLivingston_at_regmail.west.com>;
> oracle-l_at_freelists.org
> *Subject:* Re: [EXT] Re: Goldengate: How to load multiple files for
> initial load?
>
>
>
> Just saw this- I'm doing this kind of work consistently on the Azure
> side, (not as often to SQL Server and dude- SQL Server 2008 is about to be
> deprecated, you need to fix that... :) I can't tell you how often BCP has
> still saved my bacon. How often the old and trusty tools still stand the
> test of time. create a script, datapump it out to a flatfile and BCP it
> back into SQL Server. Fast and efficient every time.
>
>
>
> I also have examples, let me know if you need help,
>
>
>
>
>
> *Kellyn Pot'Vin-Gorman*
>
> DBAKevlar Blog <http://dbakevlar.com>
>
> President Denver SQL Server User Group <http://denversql.org/>
>
> about.me/dbakevlar
>
>
>
>
>
>
>
> On Fri, May 10, 2019 at 12:04 PM Andrew Kerber <andrew.kerber_at_gmail.com>
> wrote:
>
> Ok. I might be tempted to try a database link at that point, multiple
> insert into...select * from, if you have a range of values you can use.
>
>
>
> On Fri, May 10, 2019 at 11:39 AM Livingston, Daniel <DLivingston_at_west.com>
> wrote:
>
> It will load a single file of about 3 mill rows in about a minute or so,
> but it means I need to run it repeatedly (apparently).
>
>
>
> The reason for going the route with GG itself was that this table
> eventually needs to be replicated ongoing from Oracle -> SQL Server
>
>
>
> BCP seems to have the same problem: it expects a single source datafile.
>
>
>
>
>
> RMTFILE: limit of 2GB per file
>
> EXTFILE: read on a single file
>
> BCP.exe: read only a single file
>
>
>
>
>
> Almost like these utilities were last updated in 2005 or something ...
>
>
>
>
>
>
>
> *From:* Andrew Kerber [mailto:andrew.kerber_at_gmail.com]
> *Sent:* Friday, May 10, 2019 10:32 AM
> *To:* Livingston, Daniel <DLivingston_at_regmail.west.com>
> *Cc:* oracle-l_at_freelists.org
> *Subject:* [EXT] Re: Goldengate: How to load multiple files for initial
> load?
>
>
>
> For large tables, you cant really use that method and expect it to finish
> in a reasonable time. You will need to look at BCP, or perhaps even a
> database link for the initial load.
>
>
>
> On Fri, May 10, 2019 at 10:54 AM Livingston, Daniel <DLivingston_at_west.com>
> wrote:
>
> I have a table of around 65mill rows that I need to replicate from an
> Oracle 18 database to SQL Server 2008
>
>
>
> Source: Oracle 18.5, Goldengate 18.1
>
> Target: SQL Server 2008R2, Goldengate 12.3 (this system is being migrated
> later this year to SS 2017 – different story)
>
>
>
>
>
> I am trying to do this via the FILE to REPLICAT method:
>
>
>
> -- Extract to trail files on remote server
>
> -- Start SPECIALRUN replicat to read through the trail files and load the
> target table
>
>
>
>
>
> The problem I am running into is that the replicat can only seem to read a
> **single** trail file at a time. It cannot use wildcards or any other way
> I have found to specify multiple EXTFILE datafiles
>
>
>
> -------------- file list (partial) ----------------
>
>
>
> Directory: C:\GGate\dirdat
>
>
>
> Mode LastWriteTime Length Name
>
> ---- ------------- ------ ----
>
> -a--- 5/10/2019 8:54 AM 2041669405 initld000000
>
> -a--- 5/10/2019 8:56 AM 2041669563 initld000001
>
> -a--- 5/10/2019 8:59 AM 2041669310 initld000002
>
> -a--- 5/10/2019 9:01 AM 2041669276 initld000003
>
> -a--- 5/10/2019 9:03 AM 1119899237 initld000004
>
> ...
>
>
>
> ------------------------------------------------------------
>
>
>
>
>
> ----------- replicat param file -------------------------
>
>
>
> specialrun
>
> abortonfatalerror
>
> BATCHSQL
>
>
>
> discardfile ./dirrpt/initld.dsc, purge
>
>
>
> targetdb ers_gg, useridalias ggadmin
>
>
>
> extfile ./dirdat/initld
>
>
>
> sourcecatalog vsopdb
>
> MAP vsop.masterline_repl, TARGET dbo.masterline_repl, colmap(usedefaults);
>
>
>
> end runtime
>
>
>
> ------------------------------------------------------------
>
>
>
>
>
> Get the error:
>
>
>
> 2019-05-10 09:36:14 ERROR OGG-01091 Unable to open file
> "C:\GGate\dirdat/initld" (error 2, The system cannot find the file
> specified.).
>
>
>
>
>
>
>
> Is there any way to do this? Or am I supposed to create a powershell loop
> of some sort to process the full 57 data files one by one? I have tried
> with a wildcard (error), multiple EXTFILE parameters (only reads the last
> one) ... ?
>
>
>
>
>
>
>
>
>
> --
>
> Andrew W. Kerber
>
> 'If at first you dont succeed, dont take up skydiving.'
>
>
>
> --
>
> Andrew W. Kerber
>
> 'If at first you dont succeed, dont take up skydiving.'
>
>

--
http://www.freelists.org/webpage/oracle-l
Received on Sat May 11 2019 - 00:53:23 CEST

Original text of this message