Re: Data movement logic in data guard

From: <niall.litchfield_at_gmail.com>
Date: Thu, 31 Dec 2020 11:17:06 +0000
Message-ID: <CABe10sZzgNOkAM5aZwTV=FmnKgKsi6YvZTh3rjbCeZfWaYr6ow_at_mail.gmail.com>



I intended to reply to this yesterday so thanks for the response!

I think the explanation is simply that up until six months or so ago Instance 1 was the apply instance, since then instance 2 has been applying.

On Thu, Dec 31, 2020 at 11:09 AM Hemant K Chitale <hemantkchitale_at_gmail.com> wrote:

>
> Don't you find the Apply Lag Value for INST_ID=1 very strange ??
> Why is it so many days ?
>
> Hemant K Chitale
>
>
>
>
> On Wed, Dec 30, 2020 at 9:04 PM Lok P <loknath.73_at_gmail.com> wrote:
>
>> Thank you.
>>
>> Data from standby side v$dataguard_stats looks something as below. Not
>> sure why that diff is so big.
>>
>> Requirement is to move data from a few transaction tables incrementally
>> to the target(say remote DB) in regular intervals
>> ( say once in ~15minutes) as and when they reach the standby
>> database, but should not miss any or duplicate any in the target. For this
>> we want to leverage standby database considering primary is having some
>> resource constraints.
>>
>> And for this we are taking a reference column as date_created of all the
>> transaction tables which are populated with sysdate when it's
>> created/inserted in the primary database table.Now before initiating data
>> move from the standby at say exactly sysdate+N seconds, say i.e.
>> 10:01:01AM. we want to ensure all the data till that point in time must be
>> moved to standby without any left inflight. So before each data move
>> procedure triggers , we will be triggering the pre-validation to ensure all
>> the data till 10:01:01 AM has been reached standby and nothing left behind.
>> This is to safely ensure we are consistent on the target/remote database
>> without any chance of missing record, as because the criteria for the data
>> fetch is going to be Select.. from transacton_table where
>> date_created<=10:01:01AM. And next pick/fetch would be from 10:01:02AM
>> onwards after next ~15minutes. And in case some data with date_created as
>> 10:01:01AM arrives at the standby little late , they won't get picked and
>> will be missed.
>>
>> So for implementing a full proof pre data load validation logic, I was
>> looking for a date column which i can compare to ensure data movement is
>> safe till that point in time from standby to remote DB.
>>
>> INST_ID NAME VALUE UNIT TIME_COMPUTED DATUM_TIME
>>
>> 1 apply lag * +171 16:34:19 * day(2) to second(0)
>> interval 12/30/2020 07:06:09 12/30/2020 07:06:09
>>
>> 2 apply lag +00 00:00:59 day(2) to second(0) interval 12/30/2020
>> 07:06:09 12/30/2020 07:06:09
>>
>> 3 apply lag day(2) to second(0) interval 12/30/2020 07:06:09
>>
>> 4 apply lag day(2) to second(0) interval 12/30/2020 07:06:09
>>
>> 1 transport lag +00 00:00:00 day(2) to second(0)
>> interval 12/30/2020 07:06:09 12/30/2020 07:06:09
>>
>> 2 transport lag +00 00:00:00 day(2) to second(0)
>> interval 12/30/2020 07:06:09 12/30/2020 07:06:09
>>
>> 3 transport lag +00 00:00:00 day(2) to second(0)
>> interval 12/30/2020 07:06:09 12/30/2020 07:06:09
>>
>> 4 transport lag +00 00:00:00 day(2) to second(0)
>> interval 12/30/2020 07:06:09 12/30/2020 07:06:09
>>
>>
>>
>>
>> On Wed, Dec 30, 2020 at 4:09 PM <niall.litchfield_at_gmail.com> wrote:
>>
>>> I must confess to not having followed all of this but it does sound a
>>> little bit like you are trying to reinvent some wheels here.
>>>
>>> The standard (in my view anyway) way to see how far "behind" a data
>>> guard standby is to query v$dataguard_stats like the below
>>>
>>> SQL> select name,value,time_computed
>>> 2 from v$dataguard_stats
>>> 3 where name in ('apply lag','transport lag')
>>> 4 order by name;
>>>
>>> NAME VALUE
>>> TIME_COMPUTED
>>> -------------------------------- ------------------------------
>>> ------------------------------
>>> apply lag +00 00:00:00
>>> 12/30/2020 10:26:20
>>> transport lag +00 00:00:00
>>> 12/30/2020 10:26:20
>>>
>>> Similarly, for your "how do I ensure that a low activity database
>>> transports redo regularly?" question earlier, there's a parameter
>>> ARCHIVE_LAG_TARGET for that.
>>>
>>> As for the data movement from your standby to a (remote?) new table or
>>> tables in a new database, if this is a regular thing this sounds like a
>>> tailor-made use case for MERGE logic, especially if there is a possibility
>>> that source data will be updated/deleted after initial load and you also
>>> need to replicate this.
>>>
>>>
>>> On Tue, Dec 29, 2020 at 6:01 PM Lok P <loknath.73_at_gmail.com> wrote:
>>>
>>>> While we were planning on the strategy we saw one issue here. To
>>>> check/confirm if the data has been moved to standby/DR till certain
>>>> point of time (say current time i.e. sysdate), we were initially thinking
>>>> of relying on column last_replication_dt of gv$standby_log, if it's >
>>>> sysdate, then only we will assume that all data has been replicated to DR
>>>> till that point so it's good to pick/move data from DR/Standby till that
>>>> point/sysdate. If last_replication_dt <sysdate it will keep on sleep for a
>>>> few seconds and will not trigger the data move.
>>>>
>>>> But the issue in above strategy is that, in case there is no frequent
>>>> changes happen on the primary , say a Dev instance , in which we hardly
>>>> have any change happen on primary, in that case the last_replication_dt
>>>> will never reach closer/equal to sysdate, and in that case it will keep on
>>>> looping in that SLEEP status. Is there a better strategy or logic to handle
>>>> this?
>>>>
>>>> LOOP
>>>>
>>>> SELECT SYSDATE, ( SELECT max(last_time) FROM gv$standby_log)
>>>>
>>>> INTO current_dt,
>>>>
>>>> last_replication_dt
>>>>
>>>> FROM DUAL;
>>>>
>>>> WHILE (current_dt > last_replication_dt)
>>>>
>>>> LOOP
>>>>
>>>> DBMS_LOCK.SLEEP(30);
>>>>
>>>> SELECT max(last_time)
>>>>
>>>> INTO last_replication_dt
>>>>
>>>> FROM gv$standby_log;
>>>>
>>>> END LOOP;
>>>>
>>>> /
>>>>
>>>> On Sun, Dec 20, 2020 at 1:09 AM Lok P <loknath.73_at_gmail.com> wrote:
>>>>
>>>>> Thank You very much for the detailed explanation.
>>>>>
>>>>> My apology, but I am not having much idea about the backup/recovery
>>>>> area. Actually we had fallen into this wrong ordering trap while Golden
>>>>> gate replication was used in one of the systems with multiple parallel
>>>>> replication threads used for replication and we ended up missing data in
>>>>> the target system, so wanted to check if that can happen in data
>>>>> guard setup as well, as we are planning to use the data guard physical
>>>>> standby / DR database as our data source.
>>>>>
>>>>> But If i get your point correctly , I think you are pointing to the
>>>>> fact that the application of the archived redo logs on the DR database will
>>>>> happen in the same order as the logical transactions happen in the primary
>>>>> side. Which means , my initial concern regarding if the two different
>>>>> inserts can be applied in different order in the DR side as compared to
>>>>> primary is not true. And that way it will ensure data movement will never
>>>>> fall into this trap due to wrong order.
>>>>>
>>>>> My other doubt was ,
>>>>> 1)If it's good to rely on column LAST_TIME of v$standby_log(as I
>>>>> posted the logic below) for checking the lag periodically and thus keep on
>>>>> looping till the lag reduces to ZERO at certain point in time. And then
>>>>> only data fetch/load query will be executed to pick data from the
>>>>> transaction tables? And this block/loop will be called at the start of each
>>>>> data load procedure call.
>>>>>
>>>>> 2) For insert only TABLES it's easy to take MAX(date_created) and
>>>>> store it in some reference tables and then use that as reference to do the
>>>>> incremental data load to target . But how to achieve this data movement
>>>>> logic accurately, if UPDATE/DELETE also happens on the base transaction
>>>>> table?
>>>>>
>>>>> LOOP
>>>>>
>>>>> SELECT SYSDATE, ( SELECT max(last_time) FROM gv$standby_log)
>>>>>
>>>>> INTO current_dt,
>>>>>
>>>>> last_replication_dt
>>>>>
>>>>> FROM DUAL;
>>>>>
>>>>> WHILE (current_dt > last_replication_dt)
>>>>>
>>>>> LOOP
>>>>>
>>>>> DBMS_LOCK.SLEEP(30);
>>>>>
>>>>> SELECT max(last_time)
>>>>>
>>>>> INTO last_replication_dt
>>>>>
>>>>> FROM gv$standby_log;
>>>>>
>>>>> END LOOP;
>>>>>
>>>>> /
>>>>>
>>>>>
>>>>> On Sat, Dec 19, 2020 at 8:56 PM Mark W. Farnham <mwf_at_rsiz.com> wrote:
>>>>>
>>>>>> dataguard is fundamentally a recovery operation.
>>>>>>
>>>>>>
>>>>>>
>>>>>> dataguard with the “secondary open for read” is NOT logical
>>>>>> replication, it is physical replication. (Two or more separate Dataguard
>>>>>> destinations with different less than everything sets of tablespaces are
>>>>>> beyond the scope of this email.) Receipt of each transaction COMMIT or
>>>>>> ROLLBACK and the “secondary” processing completion of same is when
>>>>>> transaction results are viewable in the “open for read” dataguard secondary.
>>>>>>
>>>>>>
>>>>>>
>>>>>> If they are ever out of order, that’s a major league bug in recovery
>>>>>> (not just dataguard.)
>>>>>>
>>>>>>
>>>>>>
>>>>>> IF memory serves there is guidance documentation on using the open
>>>>>> for read secondary dataguard as a source of information. Dataguard for
>>>>>> switchover and failover existed for YEARS before the open active dataguard
>>>>>> for read, so the documentation is probably skewed a bit toward that. I
>>>>>> suggest first completely understanding just the notion for being complete
>>>>>> to a point in time with respect to commit points before moving on to the
>>>>>> exact functionality of active open for read. Then the context documents
>>>>>> about active open for read might make sense.
>>>>>>
>>>>>>
>>>>>>
>>>>>> Myself, I prefer the scheduled suspend recovery on business event,
>>>>>> cold clone rename, open renamed clone frozen (allowing for aggregation for
>>>>>> that freeze point), and resume recovery to actively open dataguard, which
>>>>>> relegates real time queries to the primary but which allows dedicated
>>>>>> configuration of a completely relationally integral database instance
>>>>>> configured for ready only advantages like aggregation and result cache and
>>>>>> minimized query traffic on undo for your selected cold clone rename
>>>>>> versions. Often quarterly, monthly, weekly, and daily for a week are good
>>>>>> choices, depending on your operational needs. For finance and manufacturing
>>>>>> many of my clients found monthly and daily for a week were useful, but that
>>>>>> was before the recovery product was built, literally using continuous
>>>>>> recovery as described for the remote physical backup. Now don’t be confused
>>>>>> by that. That has come to be called “roll your own dataguard” but it
>>>>>> preceded Oracle’s product by a long time. We called it the “standby
>>>>>> recovery database.” If you search the IOUW and OAUG archives I think you
>>>>>> can find my detailed papers.
>>>>>>
>>>>>>
>>>>>>
>>>>>> Several folks nearly simultaneously and independently “invented” this
>>>>>> after Sequent presented a sales information demo of recovering a six month
>>>>>> old vintage backup to current just from redo logs. That was early V6, so
>>>>>> probably 1989 or so. Johnny Chan and Brien Christiansen of CISCO are the
>>>>>> first folks who I worked with that routinely did the cold clone rename open
>>>>>> as a reporting database frozen in time to relieve the overloaded primary
>>>>>> systems of queries where “yesterday” and “end of last fiscal month” were
>>>>>> actually as good as (or better than) current.
>>>>>>
>>>>>>
>>>>>>
>>>>>> Good luck. Whether the redo logs are being shipped close to real time
>>>>>> for the log writer or from the archived redolog after it is switch out,
>>>>>> they are processed in order on the destination. If you are thinking of this
>>>>>> in the context of replication of logical transactions rather than
>>>>>> physically ordered commits and rollbacks, that way lies madness. Deltas are
>>>>>> being applied to datablocks.
>>>>>>
>>>>>>
>>>>>>
>>>>>> (There does exist a temporary logical phase for upgrades, but that is
>>>>>> also beyond the scope of this email.)
>>>>>>
>>>>>>
>>>>>>
>>>>>> mwf
>>>>>>
>>>>>>
>>>>>>
>>>>>> *From:* oracle-l-bounce_at_freelists.org [mailto:
>>>>>> oracle-l-bounce_at_freelists.org] *On Behalf Of *Lok P
>>>>>> *Sent:* Saturday, December 19, 2020 6:42 AM
>>>>>> *To:* Clay Jackson (cjackson)
>>>>>> *Cc:* Oracle L
>>>>>> *Subject:* Re: Data movement logic in data guard
>>>>>>
>>>>>>
>>>>>>
>>>>>> Thanks for the guidance here.
>>>>>>
>>>>>> I will try to see how I can test the scenario to ensure the
>>>>>> replication order will be exactly the same in DR as in primary even during
>>>>>> catching up large lag.
>>>>>>
>>>>>>
>>>>>>
>>>>>> Two questions with related to same
>>>>>>
>>>>>>
>>>>>>
>>>>>> 1) We don't want to keep on hitting the data read queries in those
>>>>>> transaction tables if the lag is very high and should wait for the lag to
>>>>>> catch up. For that reason is it okay to refer to the v$standby_log. And
>>>>>> keep on looping with a sleep something as below?
>>>>>>
>>>>>>
>>>>>>
>>>>>> LOOP
>>>>>>
>>>>>> SELECT SYSDATE, ( SELECT max(last_time) FROM gv$standby_log)
>>>>>>
>>>>>> INTO current_dt,
>>>>>>
>>>>>> last_replication_dt
>>>>>>
>>>>>> FROM DUAL;
>>>>>>
>>>>>> WHILE (current_dt > last_replication_dt)
>>>>>>
>>>>>> LOOP
>>>>>>
>>>>>> DBMS_LOCK.SLEEP(30);
>>>>>>
>>>>>> SELECT max(last_time)
>>>>>>
>>>>>> INTO last_replication_dt
>>>>>>
>>>>>> FROM gv$standby_log;
>>>>>>
>>>>>> END LOOP;
>>>>>>
>>>>>> /
>>>>>>
>>>>>>
>>>>>>
>>>>>> 2) For insert only TABLES it's easy to take MAX(date_created) and
>>>>>> store it in some reference tables and then use that to do the incremental
>>>>>> data load to target . But how to achieve this data movement/replication,
>>>>>> if UPDATE/DELETE also happens on the base transaction table? Is there any
>>>>>> recommended way for that?
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Sat, Dec 19, 2020 at 1:04 AM Clay Jackson (cjackson) <
>>>>>> Clay.Jackson_at_quest.com> wrote:
>>>>>>
>>>>>> While I admit I haven’t specifically tested this case, which I would
>>>>>> recommend you do, I think you’re “overthinking” this.
>>>>>>
>>>>>>
>>>>>>
>>>>>> The whole point behind DataGuard (or, at least one of the major “use
>>>>>> cases”) is to create a “consistent” COPY of your primary database, so that
>>>>>> in the event of a failure (loss of access), users can be moved to the
>>>>>> secondary database without loss of data (or consistency). The mechanism
>>>>>> by which this happens is through physical copying of the redo log(s), or
>>>>>> records within the logs from the primary to the secondary, and then
>>>>>> “applying” those logs (or records) to the secondary, IN THE SAME order as
>>>>>> they were “applied” on the primary. So, while the actual writing of the
>>>>>> transactions to the database on the secondary will ALWAYS happen at a point
>>>>>> in time AFTER transaction was written to the primary, I don’t think the
>>>>>> scenario you outlined (records being written “out of order” is possible.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> Clay Jackson
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> *From:* oracle-l-bounce_at_freelists.org <oracle-l-bounce_at_freelists.org>
>>>>>> *On Behalf Of *Lok P
>>>>>> *Sent:* Friday, December 18, 2020 11:06 AM
>>>>>> *To:* Oracle L <oracle-l_at_freelists.org>
>>>>>> *Subject:* Re: Data movement logic in data guard
>>>>>>
>>>>>>
>>>>>>
>>>>>> *CAUTION:* This email originated from outside of the organization.
>>>>>> Do not follow guidance, click links, or open attachments unless you
>>>>>> recognize the sender and know the content is safe.
>>>>>>
>>>>>>
>>>>>>
>>>>>> Moving data from one to other system looks to be very common but I
>>>>>> have never came across building such logic manually in code. And I don't
>>>>>> have
>>>>>>
>>>>>> much idea about how the archive log apply happens at standby. But it
>>>>>> seems like, to maintain constraints it has to be in exactly in same order
>>>>>> as the data load happens in primary. Else things will break.
>>>>>>
>>>>>>
>>>>>>
>>>>>> Can somebody guide me here, if the logic which we are going to rely
>>>>>> to decide reference date/time in our case for data movement will never
>>>>>> fail?
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Fri, 18 Dec 2020, 4:27 pm Lok P, <loknath.73_at_gmail.com> wrote:
>>>>>>
>>>>>> Actually in golden gate setup with multiple parallel replication
>>>>>> threads , we have encountered scenarios where two transactions generated
>>>>>> from source can reach target in different order thus causing the data
>>>>>> pickup from target to miss some rows.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Fri, Dec 18, 2020 at 9:30 AM Lok P <loknath.73_at_gmail.com> wrote:
>>>>>>
>>>>>> Its version 11.2.0.4 of oracle exadata. We have a requirement in
>>>>>> which we need to move the data to another system and for that we want to
>>>>>> utilize the DR database(which is a physical standby with data guard
>>>>>> configured) rather than Primary, as we want to not to affect the key
>>>>>> applications which are running on primary. And we are almost saturating the
>>>>>> DB resources on the primary during peak hours.
>>>>>>
>>>>>> For copying/moving data without miss in each ~15minutes interval
>>>>>> frequency, we are relying on "date_created" column as reference column of
>>>>>> the transaction table , so in case we have some additional lag from primary
>>>>>> to DR, is it possible that a record created on primary and DR as below
>>>>>> sequence, such that the row-3 created on DR before Row-2? In that case we
>>>>>> may miss that row when we take MAX(date_created) from our transaction table
>>>>>> to move the data.
>>>>>>
>>>>>> In such a scenario , how should we make our logic full proof to pick
>>>>>> the max(date_created) on source so that we won't miss any data? Or should
>>>>>> we some way utilize the column last_time of v$standby_log to make our logic
>>>>>> full proof?
>>>>>>
>>>>>> Aso i am wondering if by any way we can handle UPDATE/DELETE of the
>>>>>> records in source?
>>>>>>
>>>>>> On Primary:-
>>>>>>
>>>>>> Row-1 created at 18th DEC 10:00AM with date_created as 18th DEC
>>>>>> 10:00AM
>>>>>>
>>>>>> Row-2 created at 18th DEC 10:05 AM with date_created as 18th DEC
>>>>>> 10:05 AM
>>>>>>
>>>>>> Row-3 created at 18th DEC 10:06 AM with date_created as 18th DEC
>>>>>> 10:06 AM
>>>>>>
>>>>>> On DR when we have lag of say ~5minutes:-
>>>>>>
>>>>>> Row-1 created at 18th DEC 10:05AM with date_created as 18th DEC
>>>>>> 10:00AM
>>>>>>
>>>>>> Row-3 created at 18th DEC 10:11:01 AM with date_created as 18th DEC
>>>>>> 10:06AM
>>>>>>
>>>>>> Row-2 created at 18th DEC 10:11:02 AM with date_created as 18th DEC
>>>>>> 10:05AM
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>
>>> --
>>> Niall Litchfield
>>> Oracle DBA
>>> http://www.orawin.info
>>>
>>

-- 
Niall Litchfield
Oracle DBA
http://www.orawin.info

--
http://www.freelists.org/webpage/oracle-l
Received on Thu Dec 31 2020 - 12:17:06 CET

Original text of this message