Oracle FAQ Your Portal to the Oracle Knowledge Grid
HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US
 

Home -> Community -> Usenet -> c.d.o.server -> Re: [Long] How to deal with full file system? Best Practice?

Re: [Long] How to deal with full file system? Best Practice?

From: sybrandb <sybrandb_at_gmail.com>
Date: 18 Dec 2006 01:33:07 -0800
Message-ID: <1166434387.779055.217450@l12g2000cwl.googlegroups.com>

On Dec 18, 8:15 am, "Peter Smith [gjfc]" <goodjobfast..._at_gmail.com> wrote:
> People,
>
> I'm trying to write a simple document on how a DBA should handle an
> alert that a file system is full.
>
> Do any of you have some common sense you would like to share?
>
> Here are some ideas which come to my mind:
>
> The first reaction should be issuance of set of questions.
>
> The questions should explore two degrees of freedom.
>
> The first degree of freedom is related to the purpose or use of the
> file system.
> The second degree of freedom is related to the pattern of growth.
>
> Questions about the purpose or use of the file system:
>
> - Does this file system have an obvious purpose?
> - Is it supposed to hold a certain type of data or files?
> - Does this file system collect Archived Redo Logs?
> - If yes, is it the sole destination for Archived Redo Logs?
> - Does this file system hold data files for a database?
> - If yes, are these data files configured to autoextend?
> - If yes, what is the historical behavior of these data files?
> - If yes, Have they been growing?
> - Is this file system supposed to be a container of online backups?
> - Is this file system used to hold a Flashback Recovery Area for a 10g
> Database?
> - Is this file system the root "/" file system?
> - Is this file system the /tmp file system?
> - Is this file system the /var file system?
> - Which UNIX id's have permissions to write to this file system?
>
> Questions about growth patterns:
>
> - What is the historical growth behavior of this file system?
> - How long has it been full?
> - Did it recently fill up via a growth spurt or has the growth been
> gradual?
> - Do any growth patterns correlate with any business processes such as
> month end close, shopping season, large news events, or recent
> marketing campaigns?
> - Is a process currently trying to write to the filesystem?
> - Is the file system more than 100% full? (meaning that a root owned
> process filled it up)
>
> The first set of questions number 14 and the second set has 6. This
> means we
> have 14 x 6 = 84 permutations of answers.
>
> Next, once the DBA has some answers to the above questions or some
> other obvious questions, a common sense reaction should emerge.
>
> Some scenarios which might warrant discussion are listed below:
>
> Scenario 1:
> -A file system, which is a sole collector of Archived Redo Logs, has
> just filled up
> and the database has stopped transactions; it appears 'frozen'.
>
> Scenario 2:
> -A file system, which contains all the data files for a particular
> 'hot' tablespace
> has just filled up. The database still functions but the application
> which depends
> on the 'hot' tablespace is malfunctioning.
>
> Scenario 3:
> -A file system, which contains data files for a variety of tablespaces,
> has been slowly
> growing over that past few months and has just reached the limit of
> its size.
>
> Scenario 4:
> -A file system, which contains data files for a variety of tablespaces,
> has had no growth
> over the past several months and then in the space of several hours it
> experienced a
> growth spurt and is now full.
>
> Scenario 5:
> -A file system, which contains 'online backups' has been slowly
> growing over that past few months and has just reached the limit of
> its size.
>
> Scenario 6:
> -The root file system which is quite small compared to all the others,
> has just filled up.
>
> Scenario 7:
> -The /tmp file system which is quite small compared to all the others,
> has just filled up.
>
> Scenario 8:
> -The /var file system which is quite small compared to all the others,
> has just filled up.
>
> Scenario 1:
> Both the probability of occurrence and the severity of service
> disruption put this scenario
> at the top of the discussion list. Any database in Archivelog mode is
> at risk.
> The reaction of the DBA to this situation can usually be dictated by
> common sense but if
> the DBA is operating in a complex and unfamiliar environment he faces
> significant risk that
> he will do the wrong thing.
> One classic response is this:
> - Find another file system which has free space.
> - Move some Archived Redologs out of the full file system into the
> other file system.
> - Check the behavior of the database; it should start processing
> transactions by
> itself with no human interaction; hopefully dependent applications
> are just as resilient.
> - At this point you should have some breathing room but the risk is
> still high that
> the file system will fill and the DB will freeze again.
> - So, note the time and start a hot backup; hopefully it will go quick.
> - Assuming that the hot backup started at 01:00, you can delete all
> Archived Redologs
> created before 01:00 and this will create a large amount of free
> space in the file system.
>
> Scenario 2:
>
> The DBA will encounter this scenario if data files in the full file
> system had been configured to 'autoextend'. This configuration option
> simply means that a data file can grow. So, database growth can cause
> a file system to fill up. The DBA has a short term task and a longer
> term task. In the short term, he can configure the appropriate
> tablespace to grow in another file system which has free space.
> Another option is to locate un-needed data in the tablespace resident
> within the full file system. Once located, the DBA can remove the
> un-needed data which will free up space within the tablespace (but not
> the file system). Another option is to locate an inactive data file
> within the file system and move the file to another file system (the
> idea behind this is simpler than the SQL required to implement it).
>
> In the long term the DBA needs to implement capacity planning.
> Obviously if database growth caused a file system to fill up and a
> dependent application to malfunction, the original capacity planning
> was implemented improperly. If the DBA has the luxury to be
> proactive, one feature that exists in 10g which might help is ASM.
> This feature allows the DBA to treat available disk space as a large
> pool which requires less management from the DBA. One advantage of
> ASM is that it automates the placement of database segments for the
> purpose of balancing I/O. Before ASM, the DBA needed to balance I/O.
> Sometimes this need to balance I/O would lead to full file systems
> because placement of data (for I/O purposes) would sometimes contend
> with the goal of finding enough room for the data. Another way to say
> this is that the probability that a large pool of data will become
> full is lower than the probability that a single file system will fill
> up
> when the database spans several file systems.
>
> Another degree of freedom the DBA has to explore is the compression of
> data.
> Oracle offers a number of ways pack data more tightly. The oldest
> method
> is to cram as many table rows into each data block as possible through
> the use of a storage parameter named PCT_FREE. This method works best
> if the block size is a large value such as 32k or 64k. Setting
> PCT_FREE
> to a small value and block size to a large value would probably come
> with a performance cost. Also, the DBA can compress data within the
> database. This obviously comes with a performance cost since the
> kernel
> will be tasked with extra compression and decompression chores.
>
> Scenario 4:
> This scenario is troublesome. Based on the description, it's probable
> that some
> malfunction or unforseen business process has recently pumped a large
> amount
> of data into the database. The DBA is then tasked with becoming a data
> detective
> who needs to gain a quick understanding of this new data. He may be
> faced
> with a difficult decision to either keep the new data or jettison it.
> The
> scenario is too general to say more except that either decision is
> probably full
> of risks. It's easy to suggest that the DBA work to become familiar
> with
> the data in his database. Of course if the business processes which
> depend
> on the database are undergoing chaotic or exponential growth, the DBA
> can only
> stay ahead of the curve if he is closely connected with inner workings
> of both
> the business and the applications which are filling the database with
> data.
>
> Scenario 6:
> -The root file system which is quite small compared to all the others,
> has just filled up.
>
> This scenario suggests a malfunction in the operating system or a human
> error
> caused by the hand of a person with the root password. One obvious
> cause of
> this scenario is a 2 step error condition:
> 1. A file system becomes unmounted (due to disk error perhaps)
> 2. A process dependent on that file system tries to write a large
> amount of
> data. Since the file system is gone, the root file system receives
> the
> data and quickly fills.
>
> Scenario 7:
> -The /tmp file system which is quite small compared to all the others,
> has just filled up.
>
> The cause of this scenario often is similar to Scenario 6. The
> probability of /tmp filling to 100% is higher since it is publicly
> writeable. Removing large amounts of data from /tmp (to remedy the
> situation) is probably safer than removing large amounts of data from
> /. On Solaris, /tmp is actually a file system resident within memory
> rather than on disk. So filling /tmp on a Solaris machine is probably
> more harmful to availability than it would be on other UNIX variants.
> When dealing with Oracle software, consider it a best practice to make
> use of appropriate environment variables (TMP and TMPDIR) to prevent
> Oracle from writing files into /tmp when possible.
>
> Scenario 8:
> -The /var file system which is quite small compared to all the others,
> has just filled up.
>
> The purpose of the /var file system is to collect 'variable' data
> generated by a wide variety of applications such as web servers and
> e-mail agents. It's considered best to allow this data to accumulate
> in /var rather than / or /usr. So, obviously the probability that
> /var will fill up is higher than the filling of / and /usr. When /var
> does fill up, however, availability of some applications will suffer.
> Fortunately, most applications which have the ability to fill /var
> also provide utilities to remove un-needed files. A good DBA/SysAdmin
> is aware of these utilities and how to use them.
>
> So, these are some general ideas and thoughts concerning how a DBA
> should react when he recieves an alert that a file system is full. I
> realize that a lot could be written about this topic.
>
> Are there any obvious or large ideas that I've missed that the
> proactive DBA should be thinking about?
>
> --
> Peter Smith
> GoodJobFast...@gmail.comhttp://GoodJobFastCar.com

Sure. The issue needs to be adressed within the Call to Fix time in the SLA.
Which doesn't allow for answering 84 questions, nor making a backup. Compressing or even throwing away some archives can be a viable solution, especially if customer didn't make backup arrangements (I'm not kidding)

Also your post is strange, because if the alert occurs, your proactive DBA is already way too late.
Monitoring utilities should provide early warnings and monitor growth rates.
They shouldn't only warn after disaster has already struck.

-- 
Sybrand Bakker
Senior Oracle  DBA
Received on Mon Dec 18 2006 - 03:33:07 CST

Original text of this message

HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US