Oracle FAQ Your Portal to the Oracle Knowledge Grid
HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US
 

Home -> Community -> Usenet -> c.d.o.server -> [C.D.O.S][Long] How to deal with full file system? Best Practice?

[C.D.O.S][Long] How to deal with full file system? Best Practice?

From: Peter Smith [gjfc] <goodjobfastcar_at_gmail.com>
Date: 17 Dec 2006 23:15:29 -0800
Message-ID: <1166426129.567642.170110@n67g2000cwd.googlegroups.com>


People,

I'm trying to write a simple document on how a DBA should handle an alert that a file system is full.

Do any of you have some common sense you would like to share?

Here are some ideas which come to my mind:

The first reaction should be issuance of set of questions.

The questions should explore two degrees of freedom.

The first degree of freedom is related to the purpose or use of the file system.
The second degree of freedom is related to the pattern of growth.

Questions about the purpose or use of the file system:

Questions about growth patterns:

The first set of questions number 14 and the second set has 6. This means we
have 14 x 6 = 84 permutations of answers.

Next, once the DBA has some answers to the above questions or some other obvious questions, a common sense reaction should emerge.

Some scenarios which might warrant discussion are listed below:

Scenario 1:
-A file system, which is a sole collector of Archived Redo Logs, has
just filled up
 and the database has stopped transactions; it appears 'frozen'.

Scenario 2:
-A file system, which contains all the data files for a particular
'hot' tablespace
 has just filled up. The database still functions but the application which depends
 on the 'hot' tablespace is malfunctioning.

Scenario 3:
-A file system, which contains data files for a variety of tablespaces,
has been slowly
 growing over that past few months and has just reached the limit of its size.

Scenario 4:
-A file system, which contains data files for a variety of tablespaces,
has had no growth
 over the past several months and then in the space of several hours it experienced a
 growth spurt and is now full.

Scenario 5:
-A file system, which contains 'online backups' has been slowly
 growing over that past few months and has just reached the limit of its size.

Scenario 6:
-The root file system which is quite small compared to all the others,
has just filled up.

Scenario 7:
-The /tmp file system which is quite small compared to all the others,
has just filled up.

Scenario 8:
-The /var file system which is quite small compared to all the others,
has just filled up.

Scenario 1:
Both the probability of occurrence and the severity of service disruption put this scenario
at the top of the discussion list. Any database in Archivelog mode is at risk.
The reaction of the DBA to this situation can usually be dictated by common sense but if
the DBA is operating in a complex and unfamiliar environment he faces significant risk that
he will do the wrong thing.
One classic response is this:
- Find another file system which has free space.

Scenario 2:

The DBA will encounter this scenario if data files in the full file system had been configured to 'autoextend'. This configuration option simply means that a data file can grow. So, database growth can cause a file system to fill up. The DBA has a short term task and a longer term task. In the short term, he can configure the appropriate tablespace to grow in another file system which has free space. Another option is to locate un-needed data in the tablespace resident within the full file system. Once located, the DBA can remove the un-needed data which will free up space within the tablespace (but not the file system). Another option is to locate an inactive data file within the file system and move the file to another file system (the idea behind this is simpler than the SQL required to implement it).

In the long term the DBA needs to implement capacity planning. Obviously if database growth caused a file system to fill up and a dependent application to malfunction, the original capacity planning was implemented improperly. If the DBA has the luxury to be proactive, one feature that exists in 10g which might help is ASM. This feature allows the DBA to treat available disk space as a large pool which requires less management from the DBA. One advantage of ASM is that it automates the placement of database segments for the purpose of balancing I/O. Before ASM, the DBA needed to balance I/O. Sometimes this need to balance I/O would lead to full file systems because placement of data (for I/O purposes) would sometimes contend with the goal of finding enough room for the data. Another way to say this is that the probability that a large pool of data will become full is lower than the probability that a single file system will fill up
when the database spans several file systems.

Another degree of freedom the DBA has to explore is the compression of data.
Oracle offers a number of ways pack data more tightly. The oldest method
is to cram as many table rows into each data block as possible through the use of a storage parameter named PCT_FREE. This method works best if the block size is a large value such as 32k or 64k. Setting PCT_FREE
to a small value and block size to a large value would probably come with a performance cost. Also, the DBA can compress data within the database. This obviously comes with a performance cost since the kernel
will be tasked with extra compression and decompression chores.

Scenario 4:
This scenario is troublesome. Based on the description, it's probable that some
malfunction or unforseen business process has recently pumped a large amount
of data into the database. The DBA is then tasked with becoming a data detective
who needs to gain a quick understanding of this new data. He may be faced
with a difficult decision to either keep the new data or jettison it. The
scenario is too general to say more except that either decision is probably full
of risks. It's easy to suggest that the DBA work to become familiar with
the data in his database. Of course if the business processes which depend
on the database are undergoing chaotic or exponential growth, the DBA can only
stay ahead of the curve if he is closely connected with inner workings of both
the business and the applications which are filling the database with data.

Scenario 6:
-The root file system which is quite small compared to all the others,
has just filled up.

This scenario suggests a malfunction in the operating system or a human error
caused by the hand of a person with the root password. One obvious cause of
this scenario is a 2 step error condition: 1. A file system becomes unmounted (due to disk error perhaps) 2. A process dependent on that file system tries to write a large amount of

   data. Since the file system is gone, the root file system receives the

   data and quickly fills.

Scenario 7:
-The /tmp file system which is quite small compared to all the others,
has just filled up.

The cause of this scenario often is similar to Scenario 6. The probability of /tmp filling to 100% is higher since it is publicly writeable. Removing large amounts of data from /tmp (to remedy the situation) is probably safer than removing large amounts of data from /. On Solaris, /tmp is actually a file system resident within memory rather than on disk. So filling /tmp on a Solaris machine is probably more harmful to availability than it would be on other UNIX variants. When dealing with Oracle software, consider it a best practice to make use of appropriate environment variables (TMP and TMPDIR) to prevent Oracle from writing files into /tmp when possible.

Scenario 8:
-The /var file system which is quite small compared to all the others,
has just filled up.

The purpose of the /var file system is to collect 'variable' data generated by a wide variety of applications such as web servers and e-mail agents. It's considered best to allow this data to accumulate in /var rather than / or /usr. So, obviously the probability that /var will fill up is higher than the filling of / and /usr. When /var does fill up, however, availability of some applications will suffer. Fortunately, most applications which have the ability to fill /var also provide utilities to remove un-needed files. A good DBA/SysAdmin is aware of these utilities and how to use them.

So, these are some general ideas and thoughts concerning how a DBA should react when he recieves an alert that a file system is full. I realize that a lot could be written about this topic.

Are there any obvious or large ideas that I've missed that the proactive DBA should be thinking about?

--

Peter Smith
GoodJobFastCar_at_gmail.com
http://GoodJobFastCar.com Received on Mon Dec 18 2006 - 01:15:29 CST

Original text of this message

HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US