Oracle FAQ Your Portal to the Oracle Knowledge Grid
HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US
 

Home -> Community -> Mailing Lists -> Oracle-L -> RE: Parallel Replication of Single Table

RE: Parallel Replication of Single Table

From: Larry Elkins <elkinsl_at_flash.net>
Date: Tue, 14 Jan 2003 17:24:22 -0800
Message-ID: <F001.0052FE9A.20030114172422@fatcity.com>


Waleed,

Thanks for chiming in.

Regarding refreshing multiple tables in parallel, well yes, that could be done on a group level. Instead, though, they choose to have the scheduling tool kick off multiple single table refreshes. They start many at the same time, so they do in essence get parallelism of multiple tables at one time. And this way they get paged with a specific should something fail.

Regarding the other comments, the replication logic seems to handle the dependency you are talking about. There really isn't any "order" per se, so the same issues you raise would apply to serial processing as well, right? All those rows are in the MLOG$ table with a 1/1/4000 date -- in what order did they occur?

Obviously an insert has to occur before the update to occur (with xceptions -- e.g. update, delete, then insert same PK row, etc). And we don't care how many updates have occurred since we are only interested in pushing the current image of the row.

Now, what about deletes? Let's say you see deletes and inserts in the MLOG$ table. Well, they will all have the 1/1/4000 date prior to kicking off. How do we know which occurred first -- did we delete an existing row and then re-insert? Or did we insert and then delete? Well, that's handled by joining to the core table itself -- if the row exists, then you are going to push it, the insert would have happened after the delete. If it doesn't exist, then you know the delete happened after the insert. So the deletes are processed first using the existence in the base table test.

With tracing turned on you can see how the replication logic handles this:

Step 1 -- get the != 'I' where it's not in the existing table -- e.g. we know the last action was a delete, not an insert or update since the row no longer exists:

SELECT /*+ remote_mapped(link) */ DISTINCT   LOG$."OBJECT_ID"
FROM
 (SELECT MLOG$."OBJECT_ID" FROM
"SCHEMA"."MLOG$_MY_PTEST"@link MLOG$ WHERE
"SNAPTIME$$" > :1 AND ("DMLTYPE$$" != 'I')) LOG$ WHERE (LOG$."OBJECT_ID")
  NOT IN (SELECT MAS_TAB$."OBJECT_ID" FROM
"SCHEMA"."MY_PTEST"@LINK "MAS_TAB$" WHERE
  LOG$."OBJECT_ID" = MAS_TAB$."OBJECT_ID") Step 2 -- process those deletes:

DELETE FROM "SCHEMA"."MV_MY_PTEST" SNAP$ WHERE
 "OBJECT_ID" = :1 Step 3 -- Look for the != 'D' where it *exists* in the table. This will be the I's and U's. And because you handled the true deletes (as in no longer exists) in the prior step, then you know these go:

SELECT /*+ remote_mapped(link) */ CURRENT$."OWNER",
  CURRENT$."OBJECT_NAME",CURRENT$."SUBOBJECT_NAME",CURRENT$."OBJECT_ID",
  CURRENT$."DATA_OBJECT_ID",CURRENT$."OBJECT_TYPE",CURRENT$."CREATED",
  CURRENT$."LAST_DDL_TIME",CURRENT$."TIMESTAMP",CURRENT$."STATUS",   CURRENT$."TEMPORARY",CURRENT$."GENERATED",CURRENT$."SECONDARY" FROM
 (SELECT /*+ */ "MY_PTEST"."OWNER" "OWNER","MY_PTEST"."OBJECT_NAME"

"OBJECT_NAME","MY_PTEST"."SUBOBJECT_NAME" "SUBOBJECT_NAME",
"MY_PTEST"."OBJECT_ID" "OBJECT_ID","MY_PTEST"."DATA_OBJECT_ID"
"DATA_OBJECT_ID","MY_PTEST"."OBJECT_TYPE" "OBJECT_TYPE",
"MY_PTEST"."CREATED" "CREATED","MY_PTEST"."LAST_DDL_TIME" "LAST_DDL_TIME",
"MY_PTEST"."TIMESTAMP" "TIMESTAMP","MY_PTEST"."STATUS" "STATUS",
"MY_PTEST"."TEMPORARY" "TEMPORARY","MY_PTEST"."GENERATED" "GENERATED",
"MY_PTEST"."SECONDARY" "SECONDARY" FROM "SCHEMA"."MY_PTEST"@LINK
"MY_PTEST") CURRENT$,
  (SELECT DISTINCT MLOG$."OBJECT_ID" FROM "SCHEMA"."MLOG$_MY_PTEST"@LINK MLOG$ WHERE
"SNAPTIME$$" > :1 AND ("DMLTYPE$$" != 'D')) LOG$ WHERE
CURRENT$."OBJECT_ID" = LOG$."OBJECT_ID" Steps 4 and 5 (intermixed) -- process the inserts/updates, if you update before you insert, no big deal since you want the current image and the insert will handle it and no row to update will not fail. And if you try to insert before you update, no big deal since you will still be pushing the current image. So this goes:

UPDATE "SCHEMA"."MV_MY_PTEST" SET "OWNER" = :1,"OBJECT_NAME" = :2,
"SUBOBJECT_NAME" = :3,"OBJECT_ID" = :4,"DATA_OBJECT_ID"= :5,"OBJECT_TYPE"
=

   :6,"CREATED" = :7,"LAST_DDL_TIME" = :8,"TIMESTAMP" = :9,"STATUS" = :10,
"TEMPORARY" = :11,"GENERATED" = :12,"SECONDARY" = :13
WHERE "OBJECT_ID" = :4 INSERT INTO "SCHEMA"."MV_MY_PTEST"
("OWNER","OBJECT_NAME","SUBOBJECT_NAME",
"OBJECT_ID","DATA_OBJECT_ID","OBJECT_TYPE","CREATED","LAST_DDL_TIME",
"TIMESTAMP","STATUS","TEMPORARY","GENERATED","SECONDARY")
VALUES
 (:1,:2,:3,:4,:5,:6,:7,:8,:9,:10,:11,:12,:13)

Now I may have missed a few things there, but it sounds like it handles the dependencies. And what code and action may happen that doesn't manifest itself as SQL -- you see analyzes computing number of distinct, max, min, etc. But you don't see clustering and density factor calculations in terms of SQL. I need to do some more work on this, and the Advanced Replication guide does talk about dependencies and their impact on whether or not some things can parallelized, but I've got to dig a bit more into that. But still focusing on basic replication. Now the fun part is doing the replication in the *same* DB. I could get parallelism on the SELECT's, or at least the trace file said so, but they weren't observed, nor were the delete/updates/inserts observed working in parallel. So I still need to do some more digging and learning there. Anyway, most of the replication I've done has been lower volume (with custom code for high volumes). I'd like to get away from custom code for higher volumes, if possible. You know, things like transportable tablespaces are an opportunity, but really more appropriate for large actions -- low volume replication could be easier. Oh well, no more rambling.

Regards,

Larry G. Elkins
elkinsl_at_flash.net
214.954.1781

> -----Original Message-----
> From: root_at_fatcity.com [mailto:root_at_fatcity.com]On Behalf Of Khedr,
> Waleed
> Sent: Tuesday, January 14, 2003 1:34 PM
> To: Multiple recipients of list ORACLE-L
> Subject: RE: Parallel Replication of Single Table
>
>
> Hi Larry,
>
> I do not know really how you want to parallelize this kind of operations!
>
> The final goal is to push the changes in site 1 to site 2 and get the data
> in both tables in sync.
>
> Changes include inserts, updates, and deletes. These changes (dml
> operations)
>  could be cascaded on the same row which means that the final image of the
> data in the table is completely dependent
> on the order and sequence of these dml operations.
>
> This is why I think that a single table refresh requires it to be a serial
> operation.
>
> But on the other hand if a single refresh was requested for many tables,
> then this could be parallelized on the job level not the table level.
>
> Does it make sense or am I missing something?
>
> Regards,
>
> Waleed

> -----Original Message-----
> From: root_at_fatcity.com [mailto:root_at_fatcity.com]On Behalf Of Khedr,
> Waleed
> Sent: Tuesday, January 14, 2003 1:34 PM
> To: Multiple recipients of list ORACLE-L
> Subject: RE: Parallel Replication of Single Table
>
>
> Hi Larry,
>
> I do not know really how you want to parallelize this kind of operations!
>
> The final goal is to push the changes in site 1 to site 2 and get the data
> in both tables in sync.
>
> Changes include inserts, updates, and deletes. These changes (dml
> operations)
>  could be cascaded on the same row which means that the final image of the
> data in the table is completely dependent
> on the order and sequence of these dml operations.
>
> This is why I think that a single table refresh requires it to be a serial
> operation.
>
> But on the other hand if a single refresh was requested for many tables,
> then this could be parallelized on the job level not the table level.
>
> Does it make sense or am I missing something?
>
> Regards,

>
> Waleed
-- 
Please see the official ORACLE-L FAQ: http://www.orafaq.net
-- 
Author: Larry Elkins
  INET: elkinsl_at_flash.net

Fat City Network Services    -- 858-538-5051 http://www.fatcity.com
San Diego, California        -- Mailing list and web hosting services
---------------------------------------------------------------------
To REMOVE yourself from this mailing list, send an E-Mail message
to: ListGuru_at_fatcity.com (note EXACT spelling of 'ListGuru') and in
the message BODY, include a line containing: UNSUB ORACLE-L
(or the name of mailing list you want to be removed from).  You may
also send the HELP command for other information (like subscribing).
Received on Tue Jan 14 2003 - 19:24:22 CST

Original text of this message

HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US