Re: One primary with two physical standbys exhibiting different behavior with regard to lag

From: John Thomas <jt2354_at_gmail.com>
Date: Fri, 4 Jan 2019 22:16:03 +0000
Message-ID: <CAOHpfbF0HAEAmhG_dd0jruHUH6_=akkXiZEJ8HBTFfb6kTcQ2g_at_mail.gmail.com>



Compare alert logs across the instances too...

Regards

John

On Fri, 4 Jan 2019, 19:43 Chris Stephens <cstephens16_at_gmail.com wrote:

> "Burn the haystack and the needle will appear."
>
> i'm saving that one!
>
> On Fri, Jan 4, 2019 at 1:30 PM Mark W. Farnham <mwf_at_rsiz.com> wrote:
>
>> Given that you are 12.1.0.2, I **think** the relevant document starting
>> point is:
>>
>>
>>
>> https://docs.oracle.com/database/121/SBYDB/log_transport.htm#SBYDB00400
>>
>>
>>
>>
>>
>> This has definitely evolved over versions, starting with “roll your own”
>> in 6.0.36 and various changes through the route to productization.
>>
>>
>>
>> I’m not confident any lease after 8 failed to involve some significant
>> change (other than “roll your own” still being a process of shipping and
>> applying the logs as if you were recovering while periodically cancelling
>> recovery and making a clone/rename to test whether your continuous recovery
>> could actually be warm started and opened and then mounting and resuming
>> recovery of the standby thread.)
>>
>>
>>
>> I’d focus on how to do it right from the document before you try to
>> figure out what is wrong, since several pairs of eyes have failed to
>> quickly find that needle. Burn the haystack and the needle will appear.
>>
>>
>>
>> mwf
>>
>>
>>
>> *From:* oracle-l-bounce_at_freelists.org [mailto:
>> oracle-l-bounce_at_freelists.org] *On Behalf Of *Mark W. Farnham
>> *Sent:* Friday, January 04, 2019 12:46 PM
>> *To:* jt2354_at_gmail.com; 'Sandra Becker'
>> *Cc:* 'Neil Chandler'; 'Andrew Kerber'; 'oracle-l'
>> *Subject:* RE: One primary with two physical standbys exhibiting
>> different behavior with regard to lag
>>
>>
>>
>> Is one of these applying from lgwr and the other by shipping logs? (Sorry
>> no time to read the details.) That would explain both the lag and the
>> faster maximum apply rate, because after the log switches and finally is
>> shipped it applies very quickly locally with zero network ack/nak of
>> success application.
>>
>>
>>
>> Before it was possibly to write to the standby from lgwr, that was the
>> way it worked for all the standbys.
>>
>>
>>
>> *From:* oracle-l-bounce_at_freelists.org [
>> mailto:oracle-l-bounce_at_freelists.org <oracle-l-bounce_at_freelists.org>] *On
>> Behalf Of *John Thomas
>> *Sent:* Friday, January 04, 2019 12:27 PM
>> *To:* Sandra Becker
>> *Cc:* Neil Chandler; Andrew Kerber; oracle-l
>> *Subject:* Re: One primary with two physical standbys exhibiting
>> different behavior with regard to lag
>>
>>
>>
>> If that's not the source of your problems, you need to eliminate it
>> before looking further.
>>
>>
>>
>> The max and average apply rates are very different for the two standby's.
>> (But the slow one's faster. )
>>
>>
>>
>> Transport and apply lag on fitsdb1... maybe this is just the mismatch
>> between redo and standby log sizes. They should all be the same size and
>> you should have one more standby log than the number of redo logs on each.
>> But I guess you know that already.
>>
>>
>>
>> I should've asked for the ping times., but it's probably not relevant.
>>
>>
>>
>> Database - itsdb2
>>
>> Role: PHYSICAL STANDBY
>> Intended State: APPLY-ON
>> Transport Lag: 0 seconds (computed 0 seconds ago)
>> Apply Lag: 0 seconds (computed 0 seconds ago)
>> Average Apply Rate: 71.00 KByte/s
>> Active Apply Rate: 450.00 KByte/s
>> Maximum Apply Rate: 15.18 MByte/s
>>
>>
>>
>> *The standby below is the one that lags.*
>>
>> DGMGRL> show database verbose fitsdb1
>>
>> Database - fitsdb1
>>
>> Role: PHYSICAL STANDBY
>> Intended State: APPLY-ON
>> Transport Lag: 12 minutes 45 seconds (computed 12 seconds ago)
>> Apply Lag: 12 minutes 45 seconds (computed 12 seconds ago)
>> Average Apply Rate: 79.00 KByte/s
>> Active Apply Rate: 60.79 MByte/s
>>
>> Maximum Apply Rate: 61.21 MByte/s
>>
>>
>>
>>
>>
>> Regards
>>
>>
>>
>> Join
>>
>>
>>
>> On Fri, 4 Jan 2019, 15:41 Sandra Becker <sbecker6925_at_gmail.com wrote:
>>
>> I had an error pop up that indicated a problem with a size difference
>> between the standby redo logs and the online redo logs. Standby on the
>> primary was 10x the size of online redo logs. All the logs on the standby
>> were set up the same as the online redo. I've corrected that just to be
>> consistent and follow best practices. While checking correcting the sizes,
>> I noticed that the standby that lags never assigns a standby redo log.
>> Will be researching that next.
>>
>>
>>
>> Sandy
>>
>>
>>
>> On Fri, Jan 4, 2019 at 7:35 AM Sandra Becker <sbecker6925_at_gmail.com>
>> wrote:
>>
>> Yes, archiving the current log allows the standby to catch up.
>>
>>
>>
>> DGMGRL> show configuration verbose
>>
>> Configuration - itsdb_DG_CONFIG
>>
>> Protection Mode: MaxPerformance
>> Members:
>> itsdb1 - Primary database
>> itsdb2 - Physical standby database
>> fitsdb1 - Physical standby database
>>
>> Properties:
>> FastStartFailoverThreshold = '30'
>> OperationTimeout = '30'
>> TraceLevel = 'USER'
>> FastStartFailoverLagLimit = '30'
>> CommunicationTimeout = '180'
>> ObserverReconnect = '0'
>> FastStartFailoverAutoReinstate = 'TRUE'
>> FastStartFailoverPmyShutdown = 'TRUE'
>> BystandersFollowRoleChange = 'ALL'
>> ObserverOverride = 'FALSE'
>> ExternalDestination1 = ''
>> ExternalDestination2 = ''
>> PrimaryLostWriteAction = 'CONTINUE'
>>
>> Fast-Start Failover: DISABLED
>>
>> Configuration Status:
>> SUCCESS
>>
>>
>>
>> DGMGRL> show database verbose itsdb1
>>
>> Database - itsdb1
>>
>> Role: PRIMARY
>> Intended State: TRANSPORT-ON
>> Instance(s):
>> its1
>>
>> Properties:
>> DGConnectIdentifier = 'itsdb1'
>> ObserverConnectIdentifier = ''
>> LogXptMode = 'ASYNC'
>> RedoRoutes = ''
>> DelayMins = '0'
>> Binding = 'optional'
>> MaxFailure = '0'
>> MaxConnections = '1'
>> ReopenSecs = '300'
>> NetTimeout = '30'
>> RedoCompression = 'DISABLE'
>> LogShipping = 'ON'
>> PreferredApplyInstance = ''
>> ApplyInstanceTimeout = '0'
>> ApplyLagThreshold = '0'
>> TransportLagThreshold = '0'
>> TransportDisconnectedThreshold = '30'
>> ApplyParallel = 'AUTO'
>> StandbyFileManagement = 'AUTO'
>> ArchiveLagTarget = '0'
>> LogArchiveMaxProcesses = '30'
>> LogArchiveMinSucceedDest = '1'
>> DbFileNameConvert = '+DATA/ITSDB2/, +DATA/ITSDB1/'
>> LogFileNameConvert = '+DATA/ITSDB2/, +DATA/ITSDB1/'
>> FastStartFailoverTarget = ''
>> InconsistentProperties = '(monitor)'
>> InconsistentLogXptProps = '(monitor)'
>> SendQEntries = '(monitor)'
>> LogXptStatus = '(monitor)'
>> RecvQEntries = '(monitor)'
>> StaticConnectIdentifier =
>> '(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=pitsdbs01)(PORT=1521))(CONNECT_DATA=(SERVICE_NAME=itsdb1_DGMGRL)(INSTANCE_NAME=its1)(SERVER=DEDICATED)))'
>> StandbyArchiveLocation = '/backup/itsdb/archives'
>> AlternateLocation = ''
>> LogArchiveTrace = '0'
>> LogArchiveFormat = '%t_%s_%r.arc'
>> TopWaitEvents = '(monitor)'
>>
>> Database Status:
>> SUCCESS
>>
>>
>>
>> DGMGRL> show database verbose itsdb2
>>
>> Database - itsdb2
>>
>> Role: PHYSICAL STANDBY
>> Intended State: APPLY-ON
>> Transport Lag: 0 seconds (computed 0 seconds ago)
>> Apply Lag: 0 seconds (computed 0 seconds ago)
>> Average Apply Rate: 71.00 KByte/s
>> Active Apply Rate: 450.00 KByte/s
>> Maximum Apply Rate: 15.18 MByte/s
>> Real Time Query: OFF
>> Instance(s):
>> its2
>>
>> Properties:
>> DGConnectIdentifier = 'itsdb2'
>> ObserverConnectIdentifier = ''
>> LogXptMode = 'ASYNC'
>> RedoRoutes = ''
>> DelayMins = '0'
>> Binding = 'optional'
>> MaxFailure = '0'
>> MaxConnections = '1'
>> ReopenSecs = '300'
>> NetTimeout = '30'
>> RedoCompression = 'DISABLE'
>> LogShipping = 'ON'
>> PreferredApplyInstance = ''
>> ApplyInstanceTimeout = '0'
>> ApplyLagThreshold = '0'
>> TransportLagThreshold = '0'
>> TransportDisconnectedThreshold = '30'
>> ApplyParallel = 'AUTO'
>> StandbyFileManagement = 'AUTO'
>> ArchiveLagTarget = '0'
>> LogArchiveMaxProcesses = '30'
>> LogArchiveMinSucceedDest = '1'
>> DbFileNameConvert = '+DATA/ITSDB1/, +DATA/ITSDB2/'
>> LogFileNameConvert = '+DATA/ITSDB1/, +DATA/ITSDB2/'
>> FastStartFailoverTarget = ''
>> InconsistentProperties = '(monitor)'
>> InconsistentLogXptProps = '(monitor)'
>> SendQEntries = '(monitor)'
>> LogXptStatus = '(monitor)'
>> RecvQEntries = '(monitor)'
>> StaticConnectIdentifier =
>> '(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=pitsdbs02)(PORT=1521))(CONNECT_DATA=(SERVICE_NAME=itsdb2_DGMGRL)(INSTANCE_NAME=its2)(SERVER=DEDICATED)))'
>> StandbyArchiveLocation = '/backup/itsdb/archives'
>> AlternateLocation = ''
>> LogArchiveTrace = '0'
>> LogArchiveFormat = '%t_%s_%r.arc'
>> TopWaitEvents = '(monitor)'
>>
>> Database Status:
>> SUCCESS
>>
>>
>>
>> *The standby below is the one that lags.*
>>
>> DGMGRL> show database verbose fitsdb1
>>
>> Database - fitsdb1
>>
>> Role: PHYSICAL STANDBY
>> Intended State: APPLY-ON
>> Transport Lag: 12 minutes 45 seconds (computed 12 seconds ago)
>> Apply Lag: 12 minutes 45 seconds (computed 12 seconds ago)
>> Average Apply Rate: 79.00 KByte/s
>> Active Apply Rate: 60.79 MByte/s
>> Maximum Apply Rate: 61.21 MByte/s
>> Real Time Query: OFF
>> Instance(s):
>> fits1
>>
>> Properties:
>> DGConnectIdentifier = 'fitsdb1'
>> ObserverConnectIdentifier = ''
>> LogXptMode = 'ASYNC'
>> RedoRoutes = ''
>> DelayMins = '0'
>> Binding = 'optional'
>> MaxFailure = '0'
>> MaxConnections = '1'
>> ReopenSecs = '300'
>> NetTimeout = '30'
>> RedoCompression = 'DISABLE'
>> LogShipping = 'ON'
>> PreferredApplyInstance = ''
>> ApplyInstanceTimeout = '0'
>> ApplyLagThreshold = '0'
>> TransportLagThreshold = '0'
>> TransportDisconnectedThreshold = '30'
>> ApplyParallel = 'AUTO'
>> StandbyFileManagement = 'AUTO'
>> ArchiveLagTarget = '0'
>> LogArchiveMaxProcesses = '30'
>> LogArchiveMinSucceedDest = '1'
>> DbFileNameConvert = '+DATA/ITSDB2/, +DATA/FITSDB1/'
>> LogFileNameConvert = '+DATA/ITSDB2/, +DATA/FITSDB1/'
>> FastStartFailoverTarget = ''
>> InconsistentProperties = '(monitor)'
>> InconsistentLogXptProps = '(monitor)'
>> SendQEntries = '(monitor)'
>> LogXptStatus = '(monitor)'
>> RecvQEntries = '(monitor)'
>> StaticConnectIdentifier =
>> '(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=fitsdbs1)(PORT=1521))(CONNECT_DATA=(SERVICE_NAME=fitsdb1_DGMGRL)(INSTANCE_NAME=fits1)(SERVER=DEDICATED)))'
>> StandbyArchiveLocation = '/backup/itsdb/archives'
>> AlternateLocation = ''
>> LogArchiveTrace = '0'
>> LogArchiveFormat = '%t_%s_%r.arc'
>> TopWaitEvents = '(monitor)'
>>
>> Database Status:
>> SUCCESS
>>
>>
>>
>>
>>
>> I may be missing something, but I didn't see any differences in the
>> config between the two standbys.
>>
>>
>>
>> Sandy
>>
>>
>>
>> On Fri, Jan 4, 2019 at 3:11 AM Neil Chandler <neil_chandler_at_hotmail.com>
>> wrote:
>>
>> Sandy,
>>
>>
>>
>> So if you do a manual "alter system archive log current" on the Primary,
>> it catches up?
>>
>>
>>
>> I would look at the config parameters to compare the configuration. Could
>> you share the configuration info (db/server names suitably redacted):
>>
>>
>>
>> dgmgrl> show configuration verbose
>>
>> dgmgrl> show database verbose <primary|standby2|standby2>
>>
>>
>>
>>
>>
>> regards
>>
>>
>>
>> Neil Chandler
>>
>> Database Guy, Knows Things.
>> ------------------------------
>>
>> *From:* Sandra Becker <sbecker6925_at_gmail.com>
>> *Sent:* 03 January 2019 22:49
>> *To:* John Thomas
>> *Cc:* Andrew Kerber; Neil Chandler; oracle-l
>> *Subject:* Re: One primary with two physical standbys exhibiting
>> different behavior with regard to lag
>>
>>
>>
>> My understanding is the network setup is the same between the standbys.
>> I didn't look at the network right away. :-) I made the changes
>> suggested by Neil, but I'm still seeing the delay. Before I did a manual
>> log switch, the delay was over 30 minutes. Not good for this critical
>> production standby.
>>
>>
>>
>> Sandy
>>
>>
>>
>> On Thu, Jan 3, 2019 at 3:36 PM John Thomas <jt2354_at_gmail.com> wrote:
>>
>> There's no excessive delay between your primary and the lagging standby
>> is there? Smaller pipe? Lots of network retries?
>>
>>
>>
>> Probably the second thing you checked...
>>
>>
>>
>> Regards,
>>
>>
>>
>> John
>>
>>
>>
>> On Thu, 3 Jan 2019 at 22:10, Sandra Becker <sbecker6925_at_gmail.com> wrote:
>>
>> Thanks, Andrew. That's one of the first things I checked. It's the same
>> on both standbys.
>>
>>
>>
>> Sandy
>>
>>
>>
>> On Thu, Jan 3, 2019 at 2:47 PM Andrew Kerber <andrew.kerber_at_gmail.com>
>> wrote:
>>
>> Neil most likely spotted the problem. But you should also check to make
>> sure that the protection mode is the same on both standbys.If the instance
>> that is behind is using the maximum performance (async) mode it can run a
>> ways behind the primary.
>>
>>
>>
>> On Thu, Jan 3, 2019 at 3:36 PM Neil Chandler <neil_chandler_at_hotmail.com>
>> wrote:
>>
>> Sandy,
>>
>>
>>
>> Have you checked the Standby Redo logs? There's a slight (annoying)
>> change in Oracle 12.1 onwards which means that Standby Redo logs get
>> created with Thread 0 instead of Thread 1 by default (for a single instance
>> database). Redo can only use Standby Redo when the threads are the same. If
>> this is RAC you need Standby Redo for each thread - and you must have 1
>> more Standby Redo than Online Redo for each thread.
>>
>>
>>
>> By coincidence, I wrote a blog post about this 10 minutes ago.
>>
>>
>>
>> https://chandlerdba.com/2019/01/03/data-guard-unexpected-lag/
>>
>>
>>
>> regards
>>
>>
>>
>> Neil Chandler
>>
>> Database Guy. Knows Things.
>>
>>
>> ------------------------------
>>
>> *From:* oracle-l-bounce_at_freelists.org <oracle-l-bounce_at_freelists.org> on
>> behalf of Sandra Becker <sbecker6925_at_gmail.com>
>> *Sent:* 03 January 2019 20:29
>> *To:* oracle-l
>> *Subject:* One primary with two physical standbys exhibiting different
>> behavior with regard to lag
>>
>>
>>
>> Oracle 12.1.0.2
>>
>> RHEL7
>>
>>
>>
>> To begin with, I have not worked much at all with standby databases, so
>> my knowledge is somewhat lacking.
>>
>>
>>
>> For business reasons, we have a primary database with two physical
>> standbys. Everything is configured in dgmgrl and enabled. Monitoring with
>> EM13c is reporting the lag times, so all looks good for basic setup and
>> monitoring. We seem to have significant lag at times on one of the
>> standbys, as much as 20 minutes. When looking at v$managed_standby, we see
>> the status as "WAIT_FOR_LOG". The other standby never seems to be more
>> that a few seconds behind, if at all, and the status is "APPLYING_LOG".
>>
>>
>>
>> Is this normal? I've been researching, but haven't found an answer yet.
>> I didn't create or start the standby databases, so I don't have any idea
>> what was actually done that could be causing this behavior. Any
>> suggestions would be appreciated.
>>
>>
>>
>> Thank you,
>>
>>
>> --
>>
>> Sandy B.
>>
>>
>>
>> --
>>
>> Andrew W. Kerber
>>
>> 'If at first you dont succeed, dont take up skydiving.'
>>
>>
>>
>> --
>>
>> Sandy B.
>>
>>
>>
>> --
>>
>> Sandy B.
>>
>>
>>
>> --
>>
>> Sandy B.
>>
>>
>>
>> --
>>
>> Sandy B.
>>
>>

--
http://www.freelists.org/webpage/oracle-l
Received on Fri Jan 04 2019 - 23:16:03 CET

Original text of this message