Re: Database backups

From: Howard J. Rogers <dba_at_hjrdba.com>
Date: Sun, 24 Mar 2002 11:33:28 +1100
Message-ID: <a7j6ui$3mf$1@lust.ihug.co.nz>

I think you are missing the point. If your defence against total loss of the current redo log is a log switch, then your original blunt advice to log switch every 15 minutes is even stranger. Why stop at 15 minutes? Why not every 1 minute, or every second come to that?

I took issue with your 15 minutes advice because there were no qualifiers, no provisos, nothing. Just "aim to log switch every 15 minutes". Not even an explanation as to why. Well, now we know that it's because you are worried about losing all members of your current redo log, and you therefore want to limit the amount of data that would be lost in that eventuality. What is immediately apparent from this new revalation of your reasoning is that *if* that is the driving issue, even 15 minutes is non-sensical, since some people would think losing that amount of data too much -so again, it's not "aim to log switch every 15 minutes" but "log switch when it is appropriate for your circumstances".

But the real point is that log switching to avoid the loss of the current redo log is a daft way to proceed. Log switches are intended to control Instance Recovery time, not prevent loss of logs. Of course it's *possible* to use log switches to minimise damage done through loss of logs, but there are much better ways to avoid the loss of logs altogether. For example, 3-way multiplexing (how likely is it that 3 separate disks on three separate controllers will fail simultaneously?). Then there's RAID 1. So now you've got three members of your groups mirrored onto yet more separate devices. What about a three-way hardware RAID mirror?You're telling me that with this sort of configuration, you'll lose all members of the current group twice in the last 12 months? If that's what's happened to you, then you're either not multiplexing sufficiently, are working with dodgy hardware, or there's poor system configuration going on. And log switching frequently really doesn't address any of those issues, does it?

Bear in mind, too, that if 9i is an option, you've now the ability to transfer redo off-site in real time using Data Guard. In other words, there are a zillion ways of protecting your online redo without resorting to log switches to accomplish it.

Why do I care enough to quibble with your advice? Because it's misleading. It suggests that log switches are a mechanism that has a significant role to play in protecting you from data loss. Yet how does a log switch protect you from losing massive amounts of data if you have a failure of just one data file, and one archive log? Do log switches protect you from losing archives? Of course not, as we'd both agree. Yet you imply that log switches have something to do with protecting you from loss of data... but it's just sending the wrong message. People will rely on the advice you give them; they'll have a false sense of security ("if I do this, I can only lose 15 minutes of data"). They'll be tempted as a result to assume they don't need to invest in mechanisms such as archive and online redo multiplexing, which really does protect them from data loss (if done properly, that is). They'll just be chasing a mirage of security when there isn't any from a log switch per se.

Look at your last sentence for proof that it can (and has already) happen: "i'll take the slight hit in performance to garauntee my company that the worst case scenario is loss of business data <= 15 mins". But you have done *nothing* to guarantee that by log switching every 15 minutes. You lose one archive, and one small datafile, and you'll be signing up to much more than just 15 minutes of lost business data. So what's the frequent log switches done for you and the protection of your data? Nothing.

In any event, you carry on using log switches as a method of data loss prevention if you want. Just don't recommend it as bald, unqualified advice to others, will you?

As for the tracefile backup of the controlfile: we agree. It's just that your original statement made no mention of including it routinely in the backup, but just to do it when the physical database structure changes. Now that you make it clear that it should be routine, it's clear we're in agreement on that point at least.

Regards
HJR

--
----------------------------------------------
Resources for Oracle: http://www.hjrdba.com
===============================


"daniel" <test_at_test.com> wrote in message
news:a7j445$vi5$1_at_newsg2.svr.pol.co.uk...


> lets change the tone back to discussion... i'm not trying to persist or

> indeed argue

> ...

>

> > Why do you persist in thinking that the rate of log switching has

anything


> > to do with avoiding the loss of data?

>

> cos it does... (see your next point)

>

> >You'll only lose the data in the

> > current log if you lose *all* copies of the current log.

>

> Exactly... So this must be taken into account, dependant on the

criticality


> of your data... You'll probably think "bollocks" but this has happened to

me


> twice in the last year... and it's a real pain in the arse!

>

> and cos of this i generally get quite nervous about it, and protect myself

> by going for a shorter gap between log switches (and the resultant

> checkpoint)

>

> yes an hour between log switches would be more performant in a high

> throughput oltp environment agreed, however the achiles heel would be the

> scenario above ie; loss of all members of the current online redo group.

> agreed not the most likely scenario but it *CAN* happen. i guess observing

> the performance degradation is the key here and balancing the tradeoff as

> you quite rightly state.

>

> more risk in an ebusiness environment surely, cos most in house legacy

type


> systems, i can just go and ask the business to re-key the data etc etc

> however when its a web fronted ebusiness setup then regenerating the tx's

> could prove a nightmare... if you could do it at all...

>

> i said;

> >>every time you change the db structure "alter database backup

controlfile


> >>to trace"

>

> you said;

> >Equally dodgy advice, I think.  Backup to trace should be routine.  Every

> >backup should include it.

>

> agreed it should be part of the backup, but check out what i said again!

ie;


> you might alter the structure of the db between backups and as such Oracle

> recommend a backup of said ctl file immediately...

>

> Oracle's advice not mine; see link below

>

>

http://docs.oracle.com/cd_database_generic_8.1.7/server.817/a76993/datastru.


> htm#11039

>

> you said;

> >> And that's the real point: rules of thumb are all very well (though 15

> >> minutes is a poor one),

>

> in your opinion, which you are more than entitled to...

>

> >>but in the end it comes down to Instance Recovery time versus

performance.


>

> not forgetting how critical the data is  :O) my earlier point... i'll take

> the slight hit in performance to garauntee my company that the worst case

> scenario is loss of business data <= 15 mins... your right others may

choose


> 30 mins to an hour and hey whats 15mins between dba's

>

> --

> Regards,

>

> Daniel.

>

>

> "Howard J. Rogers" <dba_at_hjrdba.com> wrote in message

> news:a7ivhn$s9c$1_at_lust.ihug.co.nz...

> > Why do you persist in thinking that the rate of log switching has

anything


> > to do with avoiding the loss of data?  You'll only lose the data in the

> > current log if you lose *all* copies of the current log.  They invented

> > multiplexing of redo logs way back in Oracle 7 precisely so that you

> > wouldn't lose all copies.  Assuming you multiplex, the rate at which you

> log

> > switch should be governed by the rate at which you wish to

> checkpoint -which

> > has a direct relationship to the length of time it takes to perform

> Instance

> > Recovery, sure enough.  But you don't lose any committed transactions in

> > Instance Recovery, so there's no loss of data involved.

> >

> > Out of interest, most DBAs over the years seem to have settled, by way

of


> > rule of thumb, on a log switch every half hour to an hour, giving a

> > reasonable compromise between Instance Recovery time and

> checkpoint-induced

> > performance degradation. But 15 minutes is (in general) way too much

> > checkpointing, and the performance penalty is likely to be severe.

> >

> > On the other hand, I know of one terabyte-sized database where the

> Instance

> > Recovery demands were such that they log switched (and hence

checkpointed)


> > every 10 minutes.  And those were 500M redo logs.

> >

> > And that's the real point: rules of thumb are all very well (though 15

> > minutes is a poor one), but in the end it comes down to Instance

Recovery


> > time versus performance.  And there are no hard-and-fast rules on that

> > trade-off.  Everyone needs to find their own point on the scale where

they


> > are satisfied with the compromise involved.

> >

> > Regards

> > HJR

> > --

> > ----------------------------------------------

> > Resources for Oracle: http://www.hjrdba.com

> > ===============================

> >

> >

> > "daniel" <test_at_test.com> wrote in message

> > news:a7itlm$k25$1_at_newsg1.svr.pol.co.uk...

> > > surely you'll always have a trade off between performance and

recovery?


> > ie:

> > > if I need to never loose more than 15 mins of business data then i

need


> to

> > > either log switch or check point? or did i miss that meeting?

> > >

> > > --

> > > Regards,

> > >

> > > Daniel.

> > > "Howard J. Rogers" <dba_at_hjrdba.com> wrote in message

> > > news:a7ipq5$mdu$1_at_lust.ihug.co.nz...

> > > > "daniel" <test_at_test.com> wrote in message

> > > > news:a7hr94$tku$1_at_news5.svr.pol.co.uk...

> > > > > >>Actually the "dreadful advice" comment was made in relation to

> your

> > > > > >>assertion that you should aim "to log switch every 15 mins", and

> had

> > > > > nothing

> > > > > >>to do with how you do backups.

> > > > >

> > > > > a log switch every 15 mins means we're gonna checkpoint aswell,

> > > >

> > > >

> > > > I know. That's why it was dreadful advice.

> > > >

> > > > >I made no

> > > > > recommendation as to the frequency of checkpoints ie inside the

> > > logswitch

> > > > > time.

> > > > >

> > > >

> > > > And I wasn't suggesting that you had.  It's bad enough checkpointing

> > every

> > > > 15 minutes because of the log switches you want without then adding

to


> > > your

> > > > woes by inducing extra checkpointing within the logs.

> > > >

> > > > HJR

> > > >

> > > >

> > > > > >> "dreadful advice"

> > > > > Hmmm is this really neccassary?

> > > > >

> > > > > Daniel...

> > > > >

> > > > >

> > > > > "Howard J. Rogers" <dba_at_hjrdba.com> wrote in message

> > > > > news:a7g5no$26s$1_at_lust.ihug.co.nz...

> > > > > > "daniel" <test_at_test.com> wrote in message

> > > > > > news:a7g4mb$q7i$1_at_news5.svr.pol.co.uk...

> > > > > > > firstly i knew some smart arse would write such a reply,,,, my

> > reply

> > > > was

> > > > > > > trying to be generic!!!!!!

> > > > > > >

> > > > > >

> > > > > > Generic is fine.  Trying is fine.  Failing to be generic,

however,


> > > > isn't.

> > > > > >

> > > > > > > a cold backup is a consistent backup of a database that has

> > shutdown

> > > > > > normal

> > > > > > > (minus online redo logs) thus negating the need to roll

forward


> > from

> > > > > such

> > > > > > a

> > > > > > > backup. yes in an archivelog db you could bring back a df from

a


> > > cold

> > > > > > backup

> > > > > > > set and roll forward but my point was u would not normally

roll


> > > > forward

> > > > > > from

> > > > > > > a complete consistent cold backup even though you could do....

> > > > > > >

> > > > > >

> > > > > > Rubbish.  Just because you are in archivelog mode does not

mandate


> > > that

> > > > > you

> > > > > > do hot backups.  Plenty of people do cold backups, and take

> > archives.

> > > > > > Archives gives you the ability to completely recover your

> database.

> > > > > Taking

> > > > > > backups (hot or cold) gives you something which can be rolled

> > forward.

> > > > > > There's no other relationship between the two, and there's

nothing


> > > > > "normal"

> > > > > > or "abnormal" about either type of backup in archivelog mode.

> > > > > >

> > > > > > In my experience, about 35-40% of people running in archivelog

> mode

> > > take

> > > > > > cold backups.  What you say is 'not normal' for them to do, they

> > plan

> > > to

> > > > > do

> > > > > > routinely.

> > > > > >

> > > > > > So, whilst I knew the point you were trying to make, it's simply

> > > wrong.

> > > > > >

> > > > > > > regarding log switch the user states it is an ebusiness

> > environment

> > > > > > (oltp!)

> > > > > > > so we are probably putting tx's through it. well as we both

know


> > > worst

> > > > > > case

> > > > > > > scenario is u lose your current online redo log, thus tx's

that


> > may

> > > > have

> > > > > > not

> > > > > > > checkpointed (yes i know we can alter the frequency of the

> > > checkpoint)

> > > > > so

> > > > > > > online redo sized to switch every 15 mins means worst case

> > scenario

> > > is

> > > > > we

> > > > > > > lose 15 mins of bussiness data....

> > > > > > >

> > > > > >

> > > > > > So, why not checkpoint every second, 'cause that way you only

lose


> 1

> > > > > second

> > > > > > of "bussiness [sic] data"?  Because checkpoints have an

overhead.


> > And

> > > > > that

> > > > > > overhead slows down oltp transactional activity.  So to come out

> > with

> > > a

> > > > > bald

> > > > > > "make it 15 minutes" is just meaningless.

> > > > > >

> > > > > > Checkpointing should be done at a rate that balances possible

> > > > transaction

> > > > > > loss/recovery time with the slowdown in performance that

excessive


> > > > > > checkpointing induces.  The appropriate advice is to find some

> point

> > > on

> > > > > the

> > > > > > spectrum that you feel comfortable with, not come out with some

> > > > > meaningless

> > > > > > specific time interval.

> > > > > >

> > > > > > And *that* is generic advice, whereas 'make it 15 mins' is

highly


> > > > > specific,

> > > > > > highly misleading, and a thoroughly dreadful piece of advice.

> > > > > >

> > > > > > > so before we enter into a "my dad's bigger than your dad"

> argument

> > > > there

> > > > > > are

> > > > > > > 15 billion approches to oracle backups so don't call it

> "dreadfull

> > > > > > advice",

> > > > > > > it's just another way of looking at it.

> > > > > > >

> > > > > >

> > > > > > Actually the "dreadful advice" comment was made in relation to

> your

> > > > > > assertion that you should aim "to log switch every 15 mins", and

> had

> > > > > nothing

> > > > > > to do with how you do backups.

> > > > > >

> > > > > > HJR

> > > > > >

> > > > > >

> > > > > >

> > > > > >

> > > > > > > reagrds,

> > > > > > >

> > > > > > > daniel...

> > > > > > >

> > > > > > > "Howard J. Rogers" <dba_at_hjrdba.com> wrote in message

> > > > > > > news:a7el2i$hek$1_at_lust.ihug.co.nz...

> > > > > > > > Comments below

> > > > > > > > HJR

> > > > > > > > --

> > > > > > > > ----------------------------------------------

> > > > > > > > Resources for Oracle: http://www.hjrdba.com

> > > > > > > > ===============================

> > > > > > > >

> > > > > > > >

> > > > > > > > "daniel" <test_at_test.com> wrote in message

> > > > > > > > news:a7deuk$417$1_at_newsg2.svr.pol.co.uk...

> > > > > > > > > as a rule of thumb u don't roll forward from a cold

backup,


> > > (yes,

> > > > ye

> > > > > s

> > > > > > I

> > > > > > > > know

> > > > > > > > > you can but lets not get into bad habits)....

> > > > > > > > >

> > > > > > > >

> > > > > > > > That's simply not true, and it's not a bad habit either.  Of

> > > course

> > > > > you

> > > > > > > can

> > > > > > > > roll forward from a cold backup.  And taking cold backups is

> > much

> > > > > easier

> > > > > > > > than the hot variety.

> > > > > > > >

> > > > > > > > > you can use just hot backups no probs and can recover

> quicker

> > > from

> > > > > > them

> > > > > > > > >

> > > > > > > > > pointers;

> > > > > > > > >

> > > > > > > > > have multiple ctl files spanning physical disks

> > > > > > > > > multiplex your online redo logs across physical disks

> > > > > > > > > aim to log switch every 15 mins

> > > > > > > >

> > > > > > > > Dreadful advice.  If you want performance, log switch every

24


> > > > hours,

> > > > > in

> > > > > > > the

> > > > > > > > dead of night, when no-one gives a damn about the huge

amount


> of

> > > I/O

> > > > > the

> > > > > > > > associated checkpoint will induce.  If you want Instance

> > Recovery

> > > in

> > > > > ten

> > > > > > > > seconds, log switch every second or so.  Somewhere in

between


> > > those

> > > > > two

> > > > > > > > extremes will be a happy medium for *you*.

> > > > > > > >

> > > > > > > > > every time you change the db structure "alter database

> backup

> > > > > > > controlfile

> > > > > > > > to

> > > > > > > > > trace"

> > > > > > > >

> > > > > > > > Equally dodgy advice, I think.  Backup to trace should be

> > routine.

> > > > > > Every

> > > > > > > > backup should include it.

> > > > > > > >

> > > > > > > > > after the hot backup take a copy of the ctlfile and

"archive


> > log

> > > > > > > current"

> > > > > > > > > and make sure your archive redo logs are protected stream

> them

> > > off

> > > > > > > > somewhere

> > > > > > > > > else every 15 mins

> > > > > > > >

> > > > > > > > Depends on your log switch rate, of course (see above!!)

> > > > > > > >

> > > > > > > > > also when you take the hot backup keep a copy locally if

> > enough

> > > > > space

> > > > > > > also

> > > > > > > > > stream to tape and if recovery time is an issue copy to

some


> > > > network

> > > > > > > > storage

> > > > > > > > >

> > > > > > > >

> > > > > > > > Agreed.  Keep as much on disk as possible.

> > > > > > > >

> > > > > > > > Regards

> > > > > > > > HJR

> > > > > > > >

> > > > > > > > > good luck

> > > > > > > > >

> > > > > > > > > daniel...

> > > > > > > > >

> > > > > > > > >

> > > > > > > > > "Dale DeRemer" <dderemer_at_agmc.org> wrote in message

> > > > > > > > > news:a7dcs8$q7u$1_at_malgudi.oar.net...

> > > > > > > > > We are new the the Oracle world. We want our ebusiness

> server

> > to

> > > > be

> > > > > > > 7x24.

> > > > > > > > > Never, ever down. Meaning... no cold backups. So, our

> question

> > > is

> > > > > > this:

> > > > > > > If

> > > > > > > > > we use hot backups, (RMAN), and never take a cold backup,

> will

> > > we

> > > > be

> > > > > > > able

> > > > > > > > to

> > > > > > > > > recover from any failure. Additionally, what is the

impact,


> or

> > > > > > > difference

> > > > > > > > in

> > > > > > > > > recovery time for a system with no cold backups, vs. one

> with

> > a

> > > > cold

> > > > > > > > backup

> > > > > > > > > done once a week, or once a month?The DB is 75GB and will

> grow

> > > to

> > > > > > about

> > > > > > > > > 100GB over the next year. It will be updated in batches

from


> > our

> > > > > > > > mainframe.

> > > > > > > > > Users will not update it. Thanks for your help.

> > > > > > > > >

> > > > > > > > >

> > > > > > > > >

> > > > > > > > >

> > > > > > > >

> > > > > > > >

> > > > > > >

> > > > > > >

> > > > > >

> > > > > >

> > > > >

> > > > >

> > > >

> > > >

> > >

> > >

> >

> >

>

>

Received on Sat Mar 23 2002 - 18:33:28 CST