Re: AWS RDS Detecting failover

From: Maris Elsins <elmaris_at_gmail.com>
Date: Tue, 3 Oct 2017 15:27:39 +0300
Message-ID: <CABQhObsF7t68VtWFBQPXraSx1E-XxvqkmDUbMwLMYNm-DxQJGg_at_mail.gmail.com>

Hi,

I think the purpose is to come up with something that'd work in case of real un-planned failover, so killing sessions prior failover wouldn't be possible as the "failing over" is controlled by AWS. This would require some additional work, but, there's an Event notification raised at the time of RDS failover, which you could subscribe to and process it to clean up/restart the old processes/connections when failover happens (look here
<http://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_Events.html#USER_Events.Messages> )

---
Maris Elsins
_at_MarisElsins <https://twitter.com/MarisElsins>
www.facebook.com/maris.elsins



On Tue, Oct 3, 2017 at 3:05 PM, Stefan Knecht <knecht.stefan_at_gmail.com>
wrote:



> If you're saying that the connected clients do not realize that the


> database connection has died, have you tried forcefully killing all client

> sessions prior to doing the failover?

>

> That, combined with the appropriate tnsnames settings (e.g. listing both

> the primary and failover sites with appropriate connection timeouts) should

> get you what you need?

>

>

>

> On Tue, Oct 3, 2017 at 2:48 AM, Steve T. Baldwin <stbaldwin_at_msts.com>

> wrote:

>

>> Hi all,

>>

>>

>> In my testing, when a client is connected to a multi-az RDS instance and

>> I force failover, that client doesn't 'see' it.  If it makes any DB request

>> after or during the failover it ends up timing out - eventually.

>> Unfortunately this timeout is controlled by the tcp keepalive setting which

>> defaults to 2 hours.  Not very helpful when the actual failover can be

>> complete in a couple of minutes.

>>

>>

>> I'm wondering what other RDS users are doing to handle this scenario.

>>

>>

>> I've tried tinkering with sqlnet.ora params but couldn't find any that

>> would allow a connected client to detect the failover.

>>

>>

>> We may have many connected clients - both on-prem and from AWS -

>> including our on-prem DB servers using DB links.  I'm reluctant to muck

>> with OS-level tcp keepalive params, and in some cases that may not even be

>> possible (e.g. from another RDS instance).

>>

>>

>> My current solution involves using socat (http://www.dest-unreach.org/s

>> ocat/) as a proxy.  I can easily adjust the tcp keepalive parameters

>> with this and depending on the values I set for those parameters I can

>> detect failover almost immediately.

>>

>>

>> However it means either running socat on every client, or having a

>> dedicated containter/ec2 running socat - which I then have to monitor.  If

>> that container/ec2 fails but the DB doesn't all in-flight connections are

>> lost.

>>

>>

>> I'm thinking there has to be a better way.  I've contacted AWS support

>> but they suggested mucking with the tcp keepalive settings on all clients.

>> Or alternatively using SNS and Lambdas to notify/kill connected clients.

>> The latter wasn't ideal because the Lambda didn't get fired until well

>> after the failover (8-10 mins), and I have a mixture of AWS and on-prem

>> clients, so the notification part is messy.

>>

>>

>> Thanks for any suggestions.

>>

>>

>> Steve

>>

>>

>> ------------------------------------------------------------------

>> This email is intended solely for the use of the

>> addressee and may contain information that is

>> confidential, proprietary, or both. If you receive

>> this email in error please immediately notify

>> the sender and delete the email.

>> ------------------------------------------------------------------

>>

>

>

>

> --

> //

> zztat - The Next-Gen Oracle Performance Monitoring and Reaction Framework!

> Visit us at zztat.net | Support our Indiegogo campaign at igg.me/at/zztat

> | _at_zztat_oracle

>


--
http://www.freelists.org/webpage/oracle-l

Received on Tue Oct 03 2017 - 14:27:39 CEST

This message: [ Message body ]
Next message: Ravi Teja Bellamkonda: "RE: Lot of Latch Free and Enqueue Hash Chain Wait Events"
In reply to Stefan Knecht: "Re: AWS RDS Detecting failover"
Next in thread: Steve T. Baldwin: "Re: AWS RDS Detecting failover"
Reply: Steve T. Baldwin: "Re: AWS RDS Detecting failover"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

Original text of this message