Re: Bank Databases

From: Matthew Zito <matt_at_crackpotideas.com>
Date: Mon, 25 Jun 2012 13:35:42 -0400
Message-ID: <CAJ7936yUrH7HOaCcRL3TZ1e+yq8O+G6Xi0vorpPnCcqyWVXZqg_at_mail.gmail.com>

I think there's often a tendency to blame the outsourced team whenever these kinds of issues crop up and there's contractors or remote teams or offshored folks involved. But in a failure like this, as I think I said upthread, there's plenty of blame to go around:

If the offshore people were unqualified, why was management allowing them to do this upgrade?
If engineering "got to work on a fix" weds morning, why weren't they involved in the planning to insure sufficient safeguards?
If this system was so critical, why was the vendor not already involved in the upgrade process?
If the vendor was involved, why the heck did it take days to get a fix for a major international bank?

I work with software far less operationally critical to normal business execution, and I *still* get direct calls from customers that say, "We're planning to upgrade to 8.2 of your software, and I was wondering if you can take a look at our plan and make sure we're not doing anything wrong?"

I can only imagine the planning they would do if my software would prevent them from allowing people to access their money.

Matt

PS - Full disclosure notice, I work for BMC software, which makes a competing job scheduling product to CA's, though I don't work with it, have never used it, or even seen a demo - totally different side of the company. So I have no axe to grind against CA, wish them all the best, and my views are definitely not those of BMC.

On Mon, Jun 25, 2012 at 1:15 PM, Powell, Mark <mark.powell2_at_hp.com> wrote:
> The problem is not the CA-7 software in my opinion but the failure of the out-sourced staff to properly use the software.
>
>
> -----Original Message-----
> From: oracle-l-bounce_at_freelists.org [mailto:oracle-l-bounce_at_freelists.org] On Behalf Of Matthew Zito
> Sent: Monday, June 25, 2012 7:53 AM
> To: Řyvind Isene
> Cc: howard.latham_at_gmail.com; niall.litchfield_at_gmail.com; oracle-l
> Subject: Re: Bank Databases
>
> Doh - resending as got dinged for overquoting:
>
> Timely enough, the Register is reporting that CA's job scheduler software may be responsible:
>
> http://www.theregister.co.uk/2012/06/25/rbs_natwest_what_went_wrong/
>
> Could certainly mean that Oracle was still involved (or Sybase, or some other database), but the inability to schedule jobs was the root issue.
>
> Matt
>
>
>>>>
>>>> I'm particularly interested as we test our failover every 3 months
>>>> and last time we did so there was a power outage on the standby
>>>> which was running temporarily as primary which we hadn't
>>>> anticipated. The start up script tried to bring what was currently a
>>>> primary db as a standby. I'm trying to automate this and yuk without
>>>> dg broker which has its own set of problems I'm a bit stymied!
>>>> I'm not suggesting Nat West hadn't tested thir failover , but
>>>> imagine its difficult due to volumes.
>>>> On 25 June 2012 12:08, Matthew Zito <matt_at_crackpotideas.com> wrote:
>>>> > Yes, though I doubt it's anything as simple as an "Oracle issue".
>>>> > From my experience watching large organizations deal with complex
>>>> > crises like this, typically it's a series of cascading failures -
>>>> > so perhaps an Oracle database was involved, but many separate
>>>> > pieces had to fail in order to get to this point.
>>>> >
>>>> > For example, I once saw a major global company's firmwide email
>>>> > system go down for over a day due to a cascading series of:
>>>> > - storage array failure
>>>> > - misconfigured hardware
>>>> > - engineer typo
>>>> > - misunderstood recovery architecture
>>>> >
>>>> > I'm trying to keep it vague intentionally, but if any one of those
>>>> > things hadn't happened, they would have had an hour downtime on
>>>> > their email instead of a 30 hour downtime. I suspect the natwest
>>>> > issue is similar, *though* I do expect that we'll get more info in
>>>> > the coming days/weeks, so maybe we can get some more details then.
>>>> >
>>>> > Matt
>>>> >
>>>> > On Mon, Jun 25, 2012 at 7:01 AM, Howard Latham
>>>> > <howard.latham_at_gmail.com>
>>>> > wrote:
>>>> > >
>>>> > > So Nat west being unable to process transactions for 5 days due
>>>> > > to a
>>>> > change
>>>> > > in backup software and fail over could well be an Oracle issue.
>>>> > >
>>>> > > --
>>>> > > Howard A. Latham
> --
> http://www.freelists.org/webpage/oracle-l
>
>
> --
> http://www.freelists.org/webpage/oracle-l
>
>

--
http://www.freelists.org/webpage/oracle-l

Received on Mon Jun 25 2012 - 12:35:42 CDT

This message: [ Message body ]
Next message: David Fitzjarrell: "Re: Exception: When others then null !"
Previous message: Causey, Bob: "ORA-04093"
In reply to: Powell, Mark: "RE: Bank Databases"
Next in thread: Powell, Mark: "FW: Bank Databases"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

Original text of this message