Re: Data movement logic in data guard

From: <niall.litchfield_at_gmail.com>
Date: Wed, 30 Dec 2020 10:39:22 +0000
Message-ID: <CABe10sZ39KuM+EpHLAZmVxAoYwVi9nFoOjJ+UHvhsnzpNV+BZw_at_mail.gmail.com>



I must confess to not having followed all of this but it does sound a little bit like you are trying to reinvent some wheels here.

The standard (in my view anyway) way to see how far "behind" a data guard standby is to query v$dataguard_stats like the below

SQL> select name,value,time_computed
  2 from v$dataguard_stats
  3 where name in ('apply lag','transport lag')   4 order by name;

NAME                             VALUE

 TIME_COMPUTED
-------------------------------- ------------------------------
------------------------------
apply lag                        +00 00:00:00                   12/30/2020
10:26:20
transport lag                    +00 00:00:00                   12/30/2020
10:26:20

Similarly, for your "how do I ensure that a low activity database transports redo regularly?" question earlier, there's a parameter ARCHIVE_LAG_TARGET for that.

As for the data movement from your standby to a (remote?) new table or tables in a new database, if this is a regular thing this sounds like a tailor-made use case for MERGE logic, especially if there is a possibility that source data will be updated/deleted after initial load and you also need to replicate this.

On Tue, Dec 29, 2020 at 6:01 PM Lok P <loknath.73_at_gmail.com> wrote:

> While we were planning on the strategy we saw one issue here. To
> check/confirm if the data has been moved to standby/DR till certain
> point of time (say current time i.e. sysdate), we were initially thinking
> of relying on column last_replication_dt of gv$standby_log, if it's >
> sysdate, then only we will assume that all data has been replicated to DR
> till that point so it's good to pick/move data from DR/Standby till that
> point/sysdate. If last_replication_dt <sysdate it will keep on sleep for a
> few seconds and will not trigger the data move.
>
> But the issue in above strategy is that, in case there is no frequent
> changes happen on the primary , say a Dev instance , in which we hardly
> have any change happen on primary, in that case the last_replication_dt
> will never reach closer/equal to sysdate, and in that case it will keep on
> looping in that SLEEP status. Is there a better strategy or logic to handle
> this?
>
> LOOP
>
> SELECT SYSDATE, ( SELECT max(last_time) FROM gv$standby_log)
>
> INTO current_dt,
>
> last_replication_dt
>
> FROM DUAL;
>
> WHILE (current_dt > last_replication_dt)
>
> LOOP
>
> DBMS_LOCK.SLEEP(30);
>
> SELECT max(last_time)
>
> INTO last_replication_dt
>
> FROM gv$standby_log;
>
> END LOOP;
>
> /
>
> On Sun, Dec 20, 2020 at 1:09 AM Lok P <loknath.73_at_gmail.com> wrote:
>
>> Thank You very much for the detailed explanation.
>>
>> My apology, but I am not having much idea about the backup/recovery area.
>> Actually we had fallen into this wrong ordering trap while Golden gate
>> replication was used in one of the systems with multiple parallel
>> replication threads used for replication and we ended up missing data in
>> the target system, so wanted to check if that can happen in data
>> guard setup as well, as we are planning to use the data guard physical
>> standby / DR database as our data source.
>>
>> But If i get your point correctly , I think you are pointing to the fact
>> that the application of the archived redo logs on the DR database will
>> happen in the same order as the logical transactions happen in the primary
>> side. Which means , my initial concern regarding if the two different
>> inserts can be applied in different order in the DR side as compared to
>> primary is not true. And that way it will ensure data movement will never
>> fall into this trap due to wrong order.
>>
>> My other doubt was ,
>> 1)If it's good to rely on column LAST_TIME of v$standby_log(as I posted
>> the logic below) for checking the lag periodically and thus keep on looping
>> till the lag reduces to ZERO at certain point in time. And then only data
>> fetch/load query will be executed to pick data from the transaction tables?
>> And this block/loop will be called at the start of each data load procedure
>> call.
>>
>> 2) For insert only TABLES it's easy to take MAX(date_created) and store
>> it in some reference tables and then use that as reference to do the
>> incremental data load to target . But how to achieve this data movement
>> logic accurately, if UPDATE/DELETE also happens on the base transaction
>> table?
>>
>> LOOP
>>
>> SELECT SYSDATE, ( SELECT max(last_time) FROM gv$standby_log)
>>
>> INTO current_dt,
>>
>> last_replication_dt
>>
>> FROM DUAL;
>>
>> WHILE (current_dt > last_replication_dt)
>>
>> LOOP
>>
>> DBMS_LOCK.SLEEP(30);
>>
>> SELECT max(last_time)
>>
>> INTO last_replication_dt
>>
>> FROM gv$standby_log;
>>
>> END LOOP;
>>
>> /
>>
>>
>> On Sat, Dec 19, 2020 at 8:56 PM Mark W. Farnham <mwf_at_rsiz.com> wrote:
>>
>>> dataguard is fundamentally a recovery operation.
>>>
>>>
>>>
>>> dataguard with the “secondary open for read” is NOT logical replication,
>>> it is physical replication. (Two or more separate Dataguard destinations
>>> with different less than everything sets of tablespaces are beyond the
>>> scope of this email.) Receipt of each transaction COMMIT or ROLLBACK and
>>> the “secondary” processing completion of same is when transaction results
>>> are viewable in the “open for read” dataguard secondary.
>>>
>>>
>>>
>>> If they are ever out of order, that’s a major league bug in recovery
>>> (not just dataguard.)
>>>
>>>
>>>
>>> IF memory serves there is guidance documentation on using the open for
>>> read secondary dataguard as a source of information. Dataguard for
>>> switchover and failover existed for YEARS before the open active dataguard
>>> for read, so the documentation is probably skewed a bit toward that. I
>>> suggest first completely understanding just the notion for being complete
>>> to a point in time with respect to commit points before moving on to the
>>> exact functionality of active open for read. Then the context documents
>>> about active open for read might make sense.
>>>
>>>
>>>
>>> Myself, I prefer the scheduled suspend recovery on business event, cold
>>> clone rename, open renamed clone frozen (allowing for aggregation for that
>>> freeze point), and resume recovery to actively open dataguard, which
>>> relegates real time queries to the primary but which allows dedicated
>>> configuration of a completely relationally integral database instance
>>> configured for ready only advantages like aggregation and result cache and
>>> minimized query traffic on undo for your selected cold clone rename
>>> versions. Often quarterly, monthly, weekly, and daily for a week are good
>>> choices, depending on your operational needs. For finance and manufacturing
>>> many of my clients found monthly and daily for a week were useful, but that
>>> was before the recovery product was built, literally using continuous
>>> recovery as described for the remote physical backup. Now don’t be confused
>>> by that. That has come to be called “roll your own dataguard” but it
>>> preceded Oracle’s product by a long time. We called it the “standby
>>> recovery database.” If you search the IOUW and OAUG archives I think you
>>> can find my detailed papers.
>>>
>>>
>>>
>>> Several folks nearly simultaneously and independently “invented” this
>>> after Sequent presented a sales information demo of recovering a six month
>>> old vintage backup to current just from redo logs. That was early V6, so
>>> probably 1989 or so. Johnny Chan and Brien Christiansen of CISCO are the
>>> first folks who I worked with that routinely did the cold clone rename open
>>> as a reporting database frozen in time to relieve the overloaded primary
>>> systems of queries where “yesterday” and “end of last fiscal month” were
>>> actually as good as (or better than) current.
>>>
>>>
>>>
>>> Good luck. Whether the redo logs are being shipped close to real time
>>> for the log writer or from the archived redolog after it is switch out,
>>> they are processed in order on the destination. If you are thinking of this
>>> in the context of replication of logical transactions rather than
>>> physically ordered commits and rollbacks, that way lies madness. Deltas are
>>> being applied to datablocks.
>>>
>>>
>>>
>>> (There does exist a temporary logical phase for upgrades, but that is
>>> also beyond the scope of this email.)
>>>
>>>
>>>
>>> mwf
>>>
>>>
>>>
>>> *From:* oracle-l-bounce_at_freelists.org [mailto:
>>> oracle-l-bounce_at_freelists.org] *On Behalf Of *Lok P
>>> *Sent:* Saturday, December 19, 2020 6:42 AM
>>> *To:* Clay Jackson (cjackson)
>>> *Cc:* Oracle L
>>> *Subject:* Re: Data movement logic in data guard
>>>
>>>
>>>
>>> Thanks for the guidance here.
>>>
>>> I will try to see how I can test the scenario to ensure the replication
>>> order will be exactly the same in DR as in primary even during catching up
>>> large lag.
>>>
>>>
>>>
>>> Two questions with related to same
>>>
>>>
>>>
>>> 1) We don't want to keep on hitting the data read queries in those
>>> transaction tables if the lag is very high and should wait for the lag to
>>> catch up. For that reason is it okay to refer to the v$standby_log. And
>>> keep on looping with a sleep something as below?
>>>
>>>
>>>
>>> LOOP
>>>
>>> SELECT SYSDATE, ( SELECT max(last_time) FROM gv$standby_log)
>>>
>>> INTO current_dt,
>>>
>>> last_replication_dt
>>>
>>> FROM DUAL;
>>>
>>> WHILE (current_dt > last_replication_dt)
>>>
>>> LOOP
>>>
>>> DBMS_LOCK.SLEEP(30);
>>>
>>> SELECT max(last_time)
>>>
>>> INTO last_replication_dt
>>>
>>> FROM gv$standby_log;
>>>
>>> END LOOP;
>>>
>>> /
>>>
>>>
>>>
>>> 2) For insert only TABLES it's easy to take MAX(date_created) and store
>>> it in some reference tables and then use that to do the incremental data
>>> load to target . But how to achieve this data movement/replication,
>>> if UPDATE/DELETE also happens on the base transaction table? Is there any
>>> recommended way for that?
>>>
>>>
>>>
>>> On Sat, Dec 19, 2020 at 1:04 AM Clay Jackson (cjackson) <
>>> Clay.Jackson_at_quest.com> wrote:
>>>
>>> While I admit I haven’t specifically tested this case, which I would
>>> recommend you do, I think you’re “overthinking” this.
>>>
>>>
>>>
>>> The whole point behind DataGuard (or, at least one of the major “use
>>> cases”) is to create a “consistent” COPY of your primary database, so that
>>> in the event of a failure (loss of access), users can be moved to the
>>> secondary database without loss of data (or consistency). The mechanism
>>> by which this happens is through physical copying of the redo log(s), or
>>> records within the logs from the primary to the secondary, and then
>>> “applying” those logs (or records) to the secondary, IN THE SAME order as
>>> they were “applied” on the primary. So, while the actual writing of the
>>> transactions to the database on the secondary will ALWAYS happen at a point
>>> in time AFTER transaction was written to the primary, I don’t think the
>>> scenario you outlined (records being written “out of order” is possible.
>>>
>>>
>>>
>>>
>>>
>>> Clay Jackson
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> *From:* oracle-l-bounce_at_freelists.org <oracle-l-bounce_at_freelists.org> *On
>>> Behalf Of *Lok P
>>> *Sent:* Friday, December 18, 2020 11:06 AM
>>> *To:* Oracle L <oracle-l_at_freelists.org>
>>> *Subject:* Re: Data movement logic in data guard
>>>
>>>
>>>
>>> *CAUTION:* This email originated from outside of the organization. Do
>>> not follow guidance, click links, or open attachments unless you recognize
>>> the sender and know the content is safe.
>>>
>>>
>>>
>>> Moving data from one to other system looks to be very common but I have
>>> never came across building such logic manually in code. And I don't have
>>>
>>> much idea about how the archive log apply happens at standby. But it
>>> seems like, to maintain constraints it has to be in exactly in same order
>>> as the data load happens in primary. Else things will break.
>>>
>>>
>>>
>>> Can somebody guide me here, if the logic which we are going to rely to
>>> decide reference date/time in our case for data movement will never fail?
>>>
>>>
>>>
>>> On Fri, 18 Dec 2020, 4:27 pm Lok P, <loknath.73_at_gmail.com> wrote:
>>>
>>> Actually in golden gate setup with multiple parallel replication threads
>>> , we have encountered scenarios where two transactions generated from
>>> source can reach target in different order thus causing the data
>>> pickup from target to miss some rows.
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Fri, Dec 18, 2020 at 9:30 AM Lok P <loknath.73_at_gmail.com> wrote:
>>>
>>> Its version 11.2.0.4 of oracle exadata. We have a requirement in which
>>> we need to move the data to another system and for that we want to utilize
>>> the DR database(which is a physical standby with data guard configured)
>>> rather than Primary, as we want to not to affect the key applications which
>>> are running on primary. And we are almost saturating the DB resources on
>>> the primary during peak hours.
>>>
>>> For copying/moving data without miss in each ~15minutes interval
>>> frequency, we are relying on "date_created" column as reference column of
>>> the transaction table , so in case we have some additional lag from primary
>>> to DR, is it possible that a record created on primary and DR as below
>>> sequence, such that the row-3 created on DR before Row-2? In that case we
>>> may miss that row when we take MAX(date_created) from our transaction table
>>> to move the data.
>>>
>>> In such a scenario , how should we make our logic full proof to pick
>>> the max(date_created) on source so that we won't miss any data? Or should
>>> we some way utilize the column last_time of v$standby_log to make our logic
>>> full proof?
>>>
>>> Aso i am wondering if by any way we can handle UPDATE/DELETE of the
>>> records in source?
>>>
>>> On Primary:-
>>>
>>> Row-1 created at 18th DEC 10:00AM with date_created as 18th DEC 10:00AM
>>>
>>> Row-2 created at 18th DEC 10:05 AM with date_created as 18th DEC 10:05 AM
>>>
>>> Row-3 created at 18th DEC 10:06 AM with date_created as 18th DEC 10:06 AM
>>>
>>> On DR when we have lag of say ~5minutes:-
>>>
>>> Row-1 created at 18th DEC 10:05AM with date_created as 18th DEC 10:00AM
>>>
>>> Row-3 created at 18th DEC 10:11:01 AM with date_created as 18th DEC
>>> 10:06AM
>>>
>>> Row-2 created at 18th DEC 10:11:02 AM with date_created as 18th DEC
>>> 10:05AM
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>

-- 
Niall Litchfield
Oracle DBA
http://www.orawin.info

--
http://www.freelists.org/webpage/oracle-l
Received on Wed Dec 30 2020 - 11:39:22 CET

Original text of this message