Oracle-L: RE: Parallel Replication of Single Table

From: Khedr, Waleed <Waleed.Khedr_at_FMR.COM>
Date: Wed, 15 Jan 2003 08:13:41 -0800
Message-ID: <F001.00530528.20030115081341@fatcity.com>

Larry,

Thanks for taking the time to explain this. You are really patient when it comes to writing!
I see now how Oracle does the refresh. I also remember now seeing a post in this list some time ago asking about the possibility of tuning a similar sql. I think it was you :)

The way Oracle is pushing the changes make sense since they care nothing about the transactions themselves and only care about the final image of the data ( I thought they'd care about the transactions).

As I see in the sql you sent, all the sql are using bind variables. This indicates that changes are done row by row in a procedural method instead of a single sql (update, insert) that handles all the changes in one shot.

When we talk about parallelizing it, then there are two choices: The first one is using parallelism in the sql engine itself and firing one sql statement that handles either all the updates or the entire deletes to the target table.

Parallel updates is not available in releases below 9.2. Also using this method does not give a chance to handle any other processing (if there is) that is related to the refresh process/rows refreshed during the refresh.

The second choice is to continue doing it in a procedural way but spawning many Oracle processes, and let the master process read the rows from the sql and dequeue/distribute them to the parallel job processes.
The parallel jobs processes will do their jobs the same way (procedural) the single process does.

So I think the first choice (if it's available) will not be available in any Oracle releases before 9.2.
The second method if it's available will be available for a big packaged Oracle option like replication.

I asked a friend who handles replication and here is the thread:

<<
-----Original Message-----

Sent:	Wednesday, January 15, 2003 10:57 AM
To:	Khedr, Waleed
Subject:	RE: Question for you

True - this meaning of parallel propagation refers to the act of pushing repgroups concurrently to multiple destinations. There is no way that I'm aware of to make the queues be read by multiple background processes, thereby pushed in parallel.

Replication was never intended for large DSS systems and it certainly has it's limitations. ETL-type solutions may be a better road to take for DSS or otherwise extremely large databases.

-----Original Message-----

Sent:	Wednesday, January 15, 2003 10:46 AM
To:	my friend
Subject:	RE: Question for you

If the parallel push is assoc with a rep group then it may be there to handle different tables concurrently and it's not guaranteed that if a rep group has only one table that it will be done in parallel.

Am I right? any thoughts?
>>

Regards,

Waleed

-----Original Message-----
Sent: Tuesday, January 14, 2003 8:25 PM
To: ORACLE-L_at_fatcity.com
Cc: Waleed.Khedr_at_fmr.com

Waleed,

Thanks for chiming in.

Regarding refreshing multiple tables in parallel, well yes, that could be done on a group level. Instead, though, they choose to have the scheduling tool kick off multiple single table refreshes. They start many at the same time, so they do in essence get parallelism of multiple tables at one time. And this way they get paged with a specific should something fail.

Regarding the other comments, the replication logic seems to handle the dependency you are talking about. There really isn't any "order" per se, so the same issues you raise would apply to serial processing as well, right? All those rows are in the MLOG$ table with a 1/1/4000 date -- in what order did they occur?

Obviously an insert has to occur before the update to occur (with xceptions -- e.g. update, delete, then insert same PK row, etc). And we don't care how many updates have occurred since we are only interested in pushing the current image of the row.

Now, what about deletes? Let's say you see deletes and inserts in the MLOG$ table. Well, they will all have the 1/1/4000 date prior to kicking off. How do we know which occurred first -- did we delete an existing row and then re-insert? Or did we insert and then delete? Well, that's handled by joining to the core table itself -- if the row exists, then you are going to push it, the insert would have happened after the delete. If it doesn't exist, then you know the delete happened after the insert. So the deletes are processed first using the existence in the base table test.

With tracing turned on you can see how the replication logic handles this:

Step 1 -- get the != 'I' where it's not in the existing table -- e.g. we know the last action was a delete, not an insert or update since the row no longer exists:

SELECT /*+ remote_mapped(link) */ DISTINCT   LOG$."OBJECT_ID"
FROM
(SELECT MLOG$."OBJECT_ID" FROM
"SCHEMA"."MLOG$_MY_PTEST"@link MLOG$ WHERE
"SNAPTIME$$" > :1 AND ("DMLTYPE$$" != 'I')) LOG$ WHERE (LOG$."OBJECT_ID")
  NOT IN (SELECT MAS_TAB$."OBJECT_ID" FROM
"SCHEMA"."MY_PTEST"@LINK "MAS_TAB$" WHERE
  LOG$."OBJECT_ID" = MAS_TAB$."OBJECT_ID") Step 2 -- process those deletes:

DELETE FROM "SCHEMA"."MV_MY_PTEST" SNAP$ WHERE
"OBJECT_ID" = :1 Step 3 -- Look for the != 'D' where it *exists* in the table. This will be the I's and U's. And because you handled the true deletes (as in no longer exists) in the prior step, then you know these go:

SELECT /*+ remote_mapped(link) */ CURRENT$."OWNER",
  CURRENT$."OBJECT_NAME",CURRENT$."SUBOBJECT_NAME",CURRENT$."OBJECT_ID",
  CURRENT$."DATA_OBJECT_ID",CURRENT$."OBJECT_TYPE",CURRENT$."CREATED",

CURRENT$."LAST_DDL_TIME",CURRENT$."TIMESTAMP",CURRENT$."STATUS", CURRENT$."TEMPORARY",CURRENT$."GENERATED",CURRENT$."SECONDARY" FROM

 (SELECT /*+ */ "MY_PTEST"."OWNER" "OWNER","MY_PTEST"."OBJECT_NAME"

"OBJECT_NAME","MY_PTEST"."SUBOBJECT_NAME" "SUBOBJECT_NAME",
"MY_PTEST"."OBJECT_ID" "OBJECT_ID","MY_PTEST"."DATA_OBJECT_ID"
"DATA_OBJECT_ID","MY_PTEST"."OBJECT_TYPE" "OBJECT_TYPE",
"MY_PTEST"."CREATED" "CREATED","MY_PTEST"."LAST_DDL_TIME" "LAST_DDL_TIME",
"MY_PTEST"."TIMESTAMP" "TIMESTAMP","MY_PTEST"."STATUS" "STATUS",
"MY_PTEST"."TEMPORARY" "TEMPORARY","MY_PTEST"."GENERATED" "GENERATED",
"MY_PTEST"."SECONDARY" "SECONDARY" FROM "SCHEMA"."MY_PTEST"@LINK
"MY_PTEST") CURRENT$,
(SELECT DISTINCT MLOG$."OBJECT_ID" FROM "SCHEMA"."MLOG$_MY_PTEST"@LINK MLOG$ WHERE
"SNAPTIME$$" > :1 AND ("DMLTYPE$$" != 'D')) LOG$ WHERE
CURRENT$."OBJECT_ID" = LOG$."OBJECT_ID" Steps 4 and 5 (intermixed) -- process the inserts/updates, if you update before you insert, no big deal since you want the current image and the insert will handle it and no row to update will not fail. And if you try to insert before you update, no big deal since you will still be pushing the current image. So this goes:

UPDATE "SCHEMA"."MV_MY_PTEST" SET "OWNER" = :1,"OBJECT_NAME" = :2,
"SUBOBJECT_NAME" = :3,"OBJECT_ID" = :4,"DATA_OBJECT_ID"= :5,"OBJECT_TYPE"
=

:6,"CREATED" = :7,"LAST_DDL_TIME" = :8,"TIMESTAMP" = :9,"STATUS" = :10,
"TEMPORARY" = :11,"GENERATED" = :12,"SECONDARY" = :13
WHERE "OBJECT_ID" = :4 INSERT INTO "SCHEMA"."MV_MY_PTEST"
("OWNER","OBJECT_NAME","SUBOBJECT_NAME",
"OBJECT_ID","DATA_OBJECT_ID","OBJECT_TYPE","CREATED","LAST_DDL_TIME",
"TIMESTAMP","STATUS","TEMPORARY","GENERATED","SECONDARY")
VALUES
(:1,:2,:3,:4,:5,:6,:7,:8,:9,:10,:11,:12,:13)

Now I may have missed a few things there, but it sounds like it handles the dependencies. And what code and action may happen that doesn't manifest itself as SQL -- you see analyzes computing number of distinct, max, min, etc. But you don't see clustering and density factor calculations in terms of SQL. I need to do some more work on this, and the Advanced Replication guide does talk about dependencies and their impact on whether or not some things can parallelized, but I've got to dig a bit more into that. But still focusing on basic replication. Now the fun part is doing the replication in the *same* DB. I could get parallelism on the SELECT's, or at least the trace file said so, but they weren't observed, nor were the delete/updates/inserts observed working in parallel. So I still need to do some more digging and learning there. Anyway, most of the replication I've done has been lower volume (with custom code for high volumes). I'd like to get away from custom code for higher volumes, if possible. You know, things like transportable tablespaces are an opportunity, but really more appropriate for large actions -- low volume replication could be easier. Oh well, no more rambling.

Regards,

Larry G. Elkins
elkinsl_at_flash.net
214.954.1781

> -----Original Message-----
> From: root_at_fatcity.com [mailto:root_at_fatcity.com]On Behalf Of Khedr,
> Waleed
> Sent: Tuesday, January 14, 2003 1:34 PM
> To: Multiple recipients of list ORACLE-L
> Subject: RE: Parallel Replication of Single Table
>
>
> Hi Larry,
>
> I do not know really how you want to parallelize this kind of operations!
>
> The final goal is to push the changes in site 1 to site 2 and get the data
> in both tables in sync.
>
> Changes include inserts, updates, and deletes. These changes (dml
> operations)
>  could be cascaded on the same row which means that the final image of the
> data in the table is completely dependent
> on the order and sequence of these dml operations.
>
> This is why I think that a single table refresh requires it to be a serial
> operation.
>
> But on the other hand if a single refresh was requested for many tables,
> then this could be parallelized on the job level not the table level.
>
> Does it make sense or am I missing something?
>
> Regards,
>
> Waleed

> -----Original Message-----
> From: root_at_fatcity.com [mailto:root_at_fatcity.com]On Behalf Of Khedr,
> Waleed
> Sent: Tuesday, January 14, 2003 1:34 PM
> To: Multiple recipients of list ORACLE-L
> Subject: RE: Parallel Replication of Single Table
>
>
> Hi Larry,
>
> I do not know really how you want to parallelize this kind of operations!
>
> The final goal is to push the changes in site 1 to site 2 and get the data
> in both tables in sync.
>
> Changes include inserts, updates, and deletes. These changes (dml
> operations)
>  could be cascaded on the same row which means that the final image of the
> data in the table is completely dependent
> on the order and sequence of these dml operations.
>
> This is why I think that a single table refresh requires it to be a serial
> operation.
>
> But on the other hand if a single refresh was requested for many tables,
> then this could be parallelized on the job level not the table level.
>
> Does it make sense or am I missing something?
>
> Regards,

>
> Waleed

-- 
Please see the official ORACLE-L FAQ: http://www.orafaq.net
-- 
Author: Khedr, Waleed
  INET: Waleed.Khedr_at_FMR.COM

Fat City Network Services    -- 858-538-5051 http://www.fatcity.com
San Diego, California        -- Mailing list and web hosting services
---------------------------------------------------------------------
To REMOVE yourself from this mailing list, send an E-Mail message
to: ListGuru_at_fatcity.com (note EXACT spelling of 'ListGuru') and in
the message BODY, include a line containing: UNSUB ORACLE-L
(or the name of mailing list you want to be removed from).  You may
also send the HELP command for other information (like subscribing).

Received on Wed Jan 15 2003 - 10:13:41 CST