Re: Database backups

From: Howard J. Rogers <dba_at_hjrdba.com>
Date: Sun, 24 Mar 2002 13:27:49 +1100
Message-ID: <a7jdkr$aut$1@lust.ihug.co.nz>
So effectively it comes down to this: switch frequently and have shite performance so as to avoid complete idiots doing idiotic things. I think if you'd put it like that in the first place, we could have avoided this protracted discussion, and left readers to draw their own conclusions.
Some might have suggested that calling your logs log1x.rdo would have been a simpler workaround.
Regards
HJR
--
----------------------------------------------
Resources for Oracle: http://www.hjrdba.com
===============================


"daniel" <test_at_test.com> wrote in message
news:a7jcct$tj1$1_at_newsg1.svr.pol.co.uk...


> > Why not every 1 minute, or every second come to that?

>

> that would be very daft (secure) :O)

>

> > What about a three-way hardware RAID mirror?You're telling me that with

> > this sort of configuration, you'll lose all members of the current group

> twice in

> > the last 12 months?

>

> Yes,

> first system was one i inherited and thus had shite hardware config

(agreed)


> .. ie massive single disk

>

> second was the all singing all dancing mother of systems with nearly every

> eventuality covered. but a clown of a sysop (whilst i was on holiday)

though


> he would do some housekeeping. he searched for all files ending in .log

from


> the top down and deleted them... ooops...

>

> my archive logs are .arc and copied of to a nas box aswell phew... but i

> still lost all current online redo logs which had more than half a days

data


> in them. since then i changed my approach and reduced time between log

> switches.

>

> > And log switching frequently really doesn't address any of those issues,

> does

> > it?

>

> well it would have saved the day cos it would have meant half the days

data


> would have been on a remote nas, but i take your point, although we shall

> agree to dis agree.

>

> > It suggests that log switches are a mechanism that has a significant

role


> to

> > play in protecting you from data loss

>

> Again we shall agree to disagree

>

> > Yet how does a log switch protect you from losing massive amounts of

data


> if > you have a failure of just one data file, and one archive log?

>

> well losing the datafile would not be an issue but yes if you were to lose

> an archive then that would be a pain but you can write to multiple dest's.

i


> write mine off to a nas aswell as im sure a lot of dba's do?

>

> > Do log switches protect you from losing archives?

>

> no that was never a point of mine...but they can help in minimizing the

> amout of data loss

>

> > Yet you imply that log switches have something to do with protecting you

> from > loss of data

>

> see example above

>

> if i had a cat and wanted to skin it, how would you suggest?

>

> --

> Regards,

>

> Daniel.

>

>

> "Howard J. Rogers" <dba_at_hjrdba.com> wrote in message

> news:a7j6ui$3mf$1_at_lust.ihug.co.nz...

> > I think you are missing the point.  If your defence against total loss

of


> > the current redo log is a log switch, then your original blunt advice to

> log

> > switch every 15 minutes is even stranger.  Why stop at 15 minutes?  Why

> not

> > every 1 minute, or every second come to that?

> >

> > I took issue with your 15 minutes advice because there were no

qualifiers,


> > no provisos, nothing.  Just "aim to log switch every 15 minutes".  Not

> even

> > an explanation as to why.  Well, now we know that it's because you are

> > worried about losing all members of your current redo log, and you

> therefore

> > want to limit the amount of data that would be lost in that eventuality.

> > What is immediately apparent from this new revalation of your reasoning

is


> > that *if* that is the driving issue, even 15 minutes is non-sensical,

> since

> > some people would think losing that amount of data too much -so again,

> it's

> > not "aim to log switch every 15 minutes" but "log switch when it is

> > appropriate for your circumstances".

> >

> > But the real point is that log switching to avoid the loss of the

current


> > redo log is a daft way to proceed.  Log switches are intended to control

> > Instance Recovery time, not prevent loss of logs.  Of course it's

> *possible*

> > to use log switches to minimise damage done through loss of logs, but

> there

> > are much better ways to avoid the loss of logs altogether.  For example,

> > 3-way multiplexing (how likely is it that 3 separate disks on three

> separate

> > controllers will fail simultaneously?).  Then there's RAID 1.  So now

> you've

> > got three members of your groups mirrored onto yet more separate

devices.


> > What about a three-way hardware RAID mirror?You're telling me that with

> this

> > sort of configuration, you'll lose all members of the current group

twice


> in

> > the last 12 months?  If that's what's happened to you, then you're

either


> > not multiplexing sufficiently, are working with dodgy hardware, or

there's


> > poor system configuration going on.  And log switching frequently really

> > doesn't address any of those issues, does it?

> >

> > Bear in mind, too, that if 9i is an option, you've now the ability to

> > transfer redo off-site in real time using Data Guard.  In other words,

> there

> > are a zillion ways of protecting your online redo without resorting to

log


> > switches to accomplish it.

> >

> > Why do I care enough to quibble with your advice?  Because it's

> misleading.

> > It suggests that log switches are a mechanism that has a significant

role


> to

> > play in protecting you from data loss.  Yet how does a log switch

protect


> > you from losing massive amounts of data if you have a failure of just

one


> > data file, and one archive log?  Do log switches protect you from losing

> > archives?  Of course not, as we'd both agree.  Yet you imply that log

> > switches have something to do with protecting you from loss of data...

but


> > it's just sending the wrong message.  People will rely on the advice you

> > give them; they'll have a false sense of security ("if I do this, I can

> only

> > lose 15 minutes of data").  They'll be tempted as a result to assume

they


> > don't need to invest in mechanisms such as archive and online redo

> > multiplexing, which really does protect them from data loss (if done

> > properly, that is).  They'll just be chasing a mirage of security when

> there

> > isn't any from a log switch per se.

> >

> > Look at your last sentence for proof that it can (and has already)

happen:


> > "i'll take the slight hit in performance to garauntee my company that

the


> > worst case scenario is loss of business data <= 15 mins".  But you have

> done

> > *nothing* to guarantee that by log switching every 15 minutes.  You lose

> one

> > archive, and one small datafile, and you'll be signing up to much more

> than

> > just 15 minutes of lost business data.  So what's the frequent log

> switches

> > done for you and the protection of your data?  Nothing.

> >

> > In any event, you carry on using log switches as a method of data loss

> > prevention if you want.  Just don't recommend it as bald, unqualified

> advice

> > to others, will you?

> >

> > As for the tracefile backup of the controlfile: we agree.  It's just

that


> > your original statement made no mention of including it routinely in the

> > backup, but just to do it when the physical database structure changes.

> Now

> > that you make it clear that it should be routine, it's clear we're in

> > agreement on that point at least.

> >

> > Regards

> > HJR

> > --

> > ----------------------------------------------

> > Resources for Oracle: http://www.hjrdba.com

> > ===============================

> >

> >

> > "daniel" <test_at_test.com> wrote in message

> > news:a7j445$vi5$1_at_newsg2.svr.pol.co.uk...

> > > lets change the tone back to discussion... i'm not trying to persist

or


> > > indeed argue

> > > ...

> > >

> > > > Why do you persist in thinking that the rate of log switching has

> > anything

> > > > to do with avoiding the loss of data?

> > >

> > > cos it does... (see your next point)

> > >

> > > >You'll only lose the data in the

> > > > current log if you lose *all* copies of the current log.

> > >

> > > Exactly... So this must be taken into account, dependant on the

> > criticality

> > > of your data... You'll probably think "bollocks" but this has happened

> to

> > me

> > > twice in the last year... and it's a real pain in the arse!

> > >

> > > and cos of this i generally get quite nervous about it, and protect

> myself

> > > by going for a shorter gap between log switches (and the resultant

> > > checkpoint)

> > >

> > > yes an hour between log switches would be more performant in a high

> > > throughput oltp environment agreed, however the achiles heel would be

> the

> > > scenario above ie; loss of all members of the current online redo

group.


> > > agreed not the most likely scenario but it *CAN* happen. i guess

> observing

> > > the performance degradation is the key here and balancing the tradeoff

> as

> > > you quite rightly state.

> > >

> > > more risk in an ebusiness environment surely, cos most in house legacy

> > type

> > > systems, i can just go and ask the business to re-key the data etc etc

> > > however when its a web fronted ebusiness setup then regenerating the

> tx's

> > > could prove a nightmare... if you could do it at all...

> > >

> > > i said;

> > > >>every time you change the db structure "alter database backup

> > controlfile

> > > >>to trace"

> > >

> > > you said;

> > > >Equally dodgy advice, I think.  Backup to trace should be routine.

> Every

> > > >backup should include it.

> > >

> > > agreed it should be part of the backup, but check out what i said

again!


> > ie;

> > > you might alter the structure of the db between backups and as such

> Oracle

> > > recommend a backup of said ctl file immediately...

> > >

> > > Oracle's advice not mine; see link below

> > >

> > >

> >

>

http://docs.oracle.com/cd_database_generic_8.1.7/server.817/a76993/datastru.


> > > htm#11039

> > >

> > > you said;

> > > >> And that's the real point: rules of thumb are all very well (though

> 15

> > > >> minutes is a poor one),

> > >

> > > in your opinion, which you are more than entitled to...

> > >

> > > >>but in the end it comes down to Instance Recovery time versus

> > performance.

> > >

> > > not forgetting how critical the data is  :O) my earlier point... i'll

> take

> > > the slight hit in performance to garauntee my company that the worst

> case

> > > scenario is loss of business data <= 15 mins... your right others may

> > choose

> > > 30 mins to an hour and hey whats 15mins between dba's

> > >

> > > --

> > > Regards,

> > >

> > > Daniel.

> > >

> > >

> > > "Howard J. Rogers" <dba_at_hjrdba.com> wrote in message

> > > news:a7ivhn$s9c$1_at_lust.ihug.co.nz...

> > > > Why do you persist in thinking that the rate of log switching has

> > anything

> > > > to do with avoiding the loss of data?  You'll only lose the data in

> the

> > > > current log if you lose *all* copies of the current log.  They

> invented

> > > > multiplexing of redo logs way back in Oracle 7 precisely so that you

> > > > wouldn't lose all copies.  Assuming you multiplex, the rate at which

> you

> > > log

> > > > switch should be governed by the rate at which you wish to

> > > checkpoint -which

> > > > has a direct relationship to the length of time it takes to perform

> > > Instance

> > > > Recovery, sure enough.  But you don't lose any committed

transactions


> in

> > > > Instance Recovery, so there's no loss of data involved.

> > > >

> > > > Out of interest, most DBAs over the years seem to have settled, by

way


> > of

> > > > rule of thumb, on a log switch every half hour to an hour, giving a

> > > > reasonable compromise between Instance Recovery time and

> > > checkpoint-induced

> > > > performance degradation. But 15 minutes is (in general) way too much

> > > > checkpointing, and the performance penalty is likely to be severe.

> > > >

> > > > On the other hand, I know of one terabyte-sized database where the

> > > Instance

> > > > Recovery demands were such that they log switched (and hence

> > checkpointed)

> > > > every 10 minutes.  And those were 500M redo logs.

> > > >

> > > > And that's the real point: rules of thumb are all very well (though

15


> > > > minutes is a poor one), but in the end it comes down to Instance

> > Recovery

> > > > time versus performance.  And there are no hard-and-fast rules on

that


> > > > trade-off.  Everyone needs to find their own point on the scale

where


> > they

> > > > are satisfied with the compromise involved.

> > > >

> > > > Regards

> > > > HJR

> > > > --

> > > > ----------------------------------------------

> > > > Resources for Oracle: http://www.hjrdba.com

> > > > ===============================

> > > >

> > > >

> > > > "daniel" <test_at_test.com> wrote in message

> > > > news:a7itlm$k25$1_at_newsg1.svr.pol.co.uk...

> > > > > surely you'll always have a trade off between performance and

> > recovery?

> > > > ie:

> > > > > if I need to never loose more than 15 mins of business data then i

> > need

> > > to

> > > > > either log switch or check point? or did i miss that meeting?

> > > > >

> > > > > --

> > > > > Regards,

> > > > >

> > > > > Daniel.

> > > > > "Howard J. Rogers" <dba_at_hjrdba.com> wrote in message

> > > > > news:a7ipq5$mdu$1_at_lust.ihug.co.nz...

> > > > > > "daniel" <test_at_test.com> wrote in message

> > > > > > news:a7hr94$tku$1_at_news5.svr.pol.co.uk...

> > > > > > > >>Actually the "dreadful advice" comment was made in relation

to


> > > your

> > > > > > > >>assertion that you should aim "to log switch every 15 mins",

> and

> > > had

> > > > > > > nothing

> > > > > > > >>to do with how you do backups.

> > > > > > >

> > > > > > > a log switch every 15 mins means we're gonna checkpoint

aswell,


> > > > > >

> > > > > >

> > > > > > I know. That's why it was dreadful advice.

> > > > > >

> > > > > > >I made no

> > > > > > > recommendation as to the frequency of checkpoints ie inside

the


> > > > > logswitch

> > > > > > > time.

> > > > > > >

> > > > > >

> > > > > > And I wasn't suggesting that you had.  It's bad enough

> checkpointing

> > > > every

> > > > > > 15 minutes because of the log switches you want without then

> adding

> > to

> > > > > your

> > > > > > woes by inducing extra checkpointing within the logs.

> > > > > >

> > > > > > HJR

> > > > > >

> > > > > >

> > > > > > > >> "dreadful advice"

> > > > > > > Hmmm is this really neccassary?

> > > > > > >

> > > > > > > Daniel...

> > > > > > >

> > > > > > >

> > > > > > > "Howard J. Rogers" <dba_at_hjrdba.com> wrote in message

> > > > > > > news:a7g5no$26s$1_at_lust.ihug.co.nz...

> > > > > > > > "daniel" <test_at_test.com> wrote in message

> > > > > > > > news:a7g4mb$q7i$1_at_news5.svr.pol.co.uk...

> > > > > > > > > firstly i knew some smart arse would write such a

reply,,,,


> my

> > > > reply

> > > > > > was

> > > > > > > > > trying to be generic!!!!!!

> > > > > > > > >

> > > > > > > >

> > > > > > > > Generic is fine.  Trying is fine.  Failing to be generic,

> > however,

> > > > > > isn't.

> > > > > > > >

> > > > > > > > > a cold backup is a consistent backup of a database that

has


> > > > shutdown

> > > > > > > > normal

> > > > > > > > > (minus online redo logs) thus negating the need to roll

> > forward

> > > > from

> > > > > > > such

> > > > > > > > a

> > > > > > > > > backup. yes in an archivelog db you could bring back a df

> from

> > a

> > > > > cold

> > > > > > > > backup

> > > > > > > > > set and roll forward but my point was u would not normally

> > roll

> > > > > > forward

> > > > > > > > from

> > > > > > > > > a complete consistent cold backup even though you could

> do....

> > > > > > > > >

> > > > > > > >

> > > > > > > > Rubbish.  Just because you are in archivelog mode does not

> > mandate

> > > > > that

> > > > > > > you

> > > > > > > > do hot backups.  Plenty of people do cold backups, and take

> > > > archives.

> > > > > > > > Archives gives you the ability to completely recover your

> > > database.

> > > > > > > Taking

> > > > > > > > backups (hot or cold) gives you something which can be

rolled


> > > > forward.

> > > > > > > > There's no other relationship between the two, and there's

> > nothing

> > > > > > > "normal"

> > > > > > > > or "abnormal" about either type of backup in archivelog

mode.


> > > > > > > >

> > > > > > > > In my experience, about 35-40% of people running in

archivelog


> > > mode

> > > > > take

> > > > > > > > cold backups.  What you say is 'not normal' for them to do,

> they

> > > > plan

> > > > > to

> > > > > > > do

> > > > > > > > routinely.

> > > > > > > >

> > > > > > > > So, whilst I knew the point you were trying to make, it's

> simply

> > > > > wrong.

> > > > > > > >

> > > > > > > > > regarding log switch the user states it is an ebusiness

> > > > environment

> > > > > > > > (oltp!)

> > > > > > > > > so we are probably putting tx's through it. well as we

both


> > know

> > > > > worst

> > > > > > > > case

> > > > > > > > > scenario is u lose your current online redo log, thus tx's

> > that

> > > > may

> > > > > > have

> > > > > > > > not

> > > > > > > > > checkpointed (yes i know we can alter the frequency of the

> > > > > checkpoint)

> > > > > > > so

> > > > > > > > > online redo sized to switch every 15 mins means worst case

> > > > scenario

> > > > > is

> > > > > > > we

> > > > > > > > > lose 15 mins of bussiness data....

> > > > > > > > >

> > > > > > > >

> > > > > > > > So, why not checkpoint every second, 'cause that way you

only


> > lose

> > > 1

> > > > > > > second

> > > > > > > > of "bussiness [sic] data"?  Because checkpoints have an

> > overhead.

> > > > And

> > > > > > > that

> > > > > > > > overhead slows down oltp transactional activity.  So to come

> out

> > > > with

> > > > > a

> > > > > > > bald

> > > > > > > > "make it 15 minutes" is just meaningless.

> > > > > > > >

> > > > > > > > Checkpointing should be done at a rate that balances

possible


> > > > > > transaction

> > > > > > > > loss/recovery time with the slowdown in performance that

> > excessive

> > > > > > > > checkpointing induces.  The appropriate advice is to find

some


> > > point

> > > > > on

> > > > > > > the

> > > > > > > > spectrum that you feel comfortable with, not come out with

> some

> > > > > > > meaningless

> > > > > > > > specific time interval.

> > > > > > > >

> > > > > > > > And *that* is generic advice, whereas 'make it 15 mins' is

> > highly

> > > > > > > specific,

> > > > > > > > highly misleading, and a thoroughly dreadful piece of

advice.


> > > > > > > >

> > > > > > > > > so before we enter into a "my dad's bigger than your dad"

> > > argument

> > > > > > there

> > > > > > > > are

> > > > > > > > > 15 billion approches to oracle backups so don't call it

> > > "dreadfull

> > > > > > > > advice",

> > > > > > > > > it's just another way of looking at it.

> > > > > > > > >

> > > > > > > >

> > > > > > > > Actually the "dreadful advice" comment was made in relation

to


> > > your

> > > > > > > > assertion that you should aim "to log switch every 15 mins",

> and

> > > had

> > > > > > > nothing

> > > > > > > > to do with how you do backups.

> > > > > > > >

> > > > > > > > HJR

> > > > > > > >

> > > > > > > >

> > > > > > > >

> > > > > > > >

> > > > > > > > > reagrds,

> > > > > > > > >

> > > > > > > > > daniel...

> > > > > > > > >

> > > > > > > > > "Howard J. Rogers" <dba_at_hjrdba.com> wrote in message

> > > > > > > > > news:a7el2i$hek$1_at_lust.ihug.co.nz...

> > > > > > > > > > Comments below

> > > > > > > > > > HJR

> > > > > > > > > > --

> > > > > > > > > > ----------------------------------------------

> > > > > > > > > > Resources for Oracle: http://www.hjrdba.com

> > > > > > > > > > ===============================

> > > > > > > > > >

> > > > > > > > > >

> > > > > > > > > > "daniel" <test_at_test.com> wrote in message

> > > > > > > > > > news:a7deuk$417$1_at_newsg2.svr.pol.co.uk...

> > > > > > > > > > > as a rule of thumb u don't roll forward from a cold

> > backup,

> > > > > (yes,

> > > > > > ye

> > > > > > > s

> > > > > > > > I

> > > > > > > > > > know

> > > > > > > > > > > you can but lets not get into bad habits)....

> > > > > > > > > > >

> > > > > > > > > >

> > > > > > > > > > That's simply not true, and it's not a bad habit either.

> Of

> > > > > course

> > > > > > > you

> > > > > > > > > can

> > > > > > > > > > roll forward from a cold backup.  And taking cold

backups


> is

> > > > much

> > > > > > > easier

> > > > > > > > > > than the hot variety.

> > > > > > > > > >

> > > > > > > > > > > you can use just hot backups no probs and can recover

> > > quicker

> > > > > from

> > > > > > > > them

> > > > > > > > > > >

> > > > > > > > > > > pointers;

> > > > > > > > > > >

> > > > > > > > > > > have multiple ctl files spanning physical disks

> > > > > > > > > > > multiplex your online redo logs across physical disks

> > > > > > > > > > > aim to log switch every 15 mins

> > > > > > > > > >

> > > > > > > > > > Dreadful advice.  If you want performance, log switch

> every

> > 24

> > > > > > hours,

> > > > > > > in

> > > > > > > > > the

> > > > > > > > > > dead of night, when no-one gives a damn about the huge

> > amount

> > > of

> > > > > I/O

> > > > > > > the

> > > > > > > > > > associated checkpoint will induce.  If you want Instance

> > > > Recovery

> > > > > in

> > > > > > > ten

> > > > > > > > > > seconds, log switch every second or so.  Somewhere in

> > between

> > > > > those

> > > > > > > two

> > > > > > > > > > extremes will be a happy medium for *you*.

> > > > > > > > > >

> > > > > > > > > > > every time you change the db structure "alter database

> > > backup

> > > > > > > > > controlfile

> > > > > > > > > > to

> > > > > > > > > > > trace"

> > > > > > > > > >

> > > > > > > > > > Equally dodgy advice, I think.  Backup to trace should

be


> > > > routine.

> > > > > > > > Every

> > > > > > > > > > backup should include it.

> > > > > > > > > >

> > > > > > > > > > > after the hot backup take a copy of the ctlfile and

> > "archive

> > > > log

> > > > > > > > > current"

> > > > > > > > > > > and make sure your archive redo logs are protected

> stream

> > > them

> > > > > off

> > > > > > > > > > somewhere

> > > > > > > > > > > else every 15 mins

> > > > > > > > > >

> > > > > > > > > > Depends on your log switch rate, of course (see above!!)

> > > > > > > > > >

> > > > > > > > > > > also when you take the hot backup keep a copy locally

if


> > > > enough

> > > > > > > space

> > > > > > > > > also

> > > > > > > > > > > stream to tape and if recovery time is an issue copy

to


> > some

> > > > > > network

> > > > > > > > > > storage

> > > > > > > > > > >

> > > > > > > > > >

> > > > > > > > > > Agreed.  Keep as much on disk as possible.

> > > > > > > > > >

> > > > > > > > > > Regards

> > > > > > > > > > HJR

> > > > > > > > > >

> > > > > > > > > > > good luck

> > > > > > > > > > >

> > > > > > > > > > > daniel...

> > > > > > > > > > >

> > > > > > > > > > >

> > > > > > > > > > > "Dale DeRemer" <dderemer_at_agmc.org> wrote in message

> > > > > > > > > > > news:a7dcs8$q7u$1_at_malgudi.oar.net...

> > > > > > > > > > > We are new the the Oracle world. We want our ebusiness

> > > server

> > > > to

> > > > > > be

> > > > > > > > > 7x24.

> > > > > > > > > > > Never, ever down. Meaning... no cold backups. So, our

> > > question

> > > > > is

> > > > > > > > this:

> > > > > > > > > If

> > > > > > > > > > > we use hot backups, (RMAN), and never take a cold

> backup,

> > > will

> > > > > we

> > > > > > be

> > > > > > > > > able

> > > > > > > > > > to

> > > > > > > > > > > recover from any failure. Additionally, what is the

> > impact,

> > > or

> > > > > > > > > difference

> > > > > > > > > > in

> > > > > > > > > > > recovery time for a system with no cold backups, vs.

one


> > > with

> > > > a

> > > > > > cold

> > > > > > > > > > backup

> > > > > > > > > > > done once a week, or once a month?The DB is 75GB and

> will

> > > grow

> > > > > to

> > > > > > > > about

> > > > > > > > > > > 100GB over the next year. It will be updated in

batches


> > from

> > > > our

> > > > > > > > > > mainframe.

> > > > > > > > > > > Users will not update it. Thanks for your help.

> > > > > > > > > > >

> > > > > > > > > > >

> > > > > > > > > > >

> > > > > > > > > > >

> > > > > > > > > >

> > > > > > > > > >

> > > > > > > > >

> > > > > > > > >

> > > > > > > >

> > > > > > > >

> > > > > > >

> > > > > > >

> > > > > >

> > > > > >

> > > > >

> > > > >

> > > >

> > > >

> > >

> > >

> >

> >

>

>
Received on Sat Mar 23 2002 - 20:27:49 CST