Re: Failure Modes and Ranges of Operation
Date: 3 Feb 2007 11:26:24 -0800
Message-ID: <1170530784.938933.208330_at_k78g2000cwa.googlegroups.com>
On Feb 3, 7:26 am, Bob Badour <bbad..._at_pei.sympatico.ca> wrote:
> Neo's recent troll started me down a meandering path leading to these
> two separate but somewhat related (engineering) concerns.
>
> An operating range describes the conditions under which one of our
> devices will operate without failure. Failure modes are what happens
> when one tries to operate a device beyond that range.
>
> The sci.logic sort of folks grapple with the problems one encounters
> when one tries to have an infinite range of operation. And yet one can
> never fully escape the failure mode problem because division by zero
> generally fails.
>
> Failure mode analysis is very important in engineering. I wonder whether
> it has any supporting theory? Certainly, one can think of general
> principles. Likewise, beyond statistical analysis and empirical
> measurement, does any theory exist regarding ranges of operation?
>
> Any thoughts?
One sound theoretical aspect of the issue you're mentioning is the
totality/partiality of functions. "Total functions" are functions
whose
f(x) = x/0
f(x) = f(f(x))
(I'll go out on a limb here and say that we may someday
be able to do something with the second kind, the kind
that require infinite resources to compute. Consider that
wild speculation though.)
In general, we find that "forward" computations, those
that are built out of successive lower level computations,
do not cause us problems. Given the ordered naturals,
we can speak of the successor function; given successor,
we can construct addition, which gives rise to multiplication,
which gives rise to exponentiation, which given rise to
tetration, ad infinitum. Each of these functions is total.
Each successive one has an inverse, and it seems the
inverses are a significant source of problems.
The predecessor function is defined on every natural number
except 0. Oops! Subtraction is only defined on every pair of
natural numbers where the minuend is >= the subtrahend.
The integers solve this issue, and make subtraction total.
Multiplication is total even when we extend it to the integers,
but most pairs of integers divided don't make an integer.
So we get the rationals, but we still have a problem at zero.
At the next level, the forward function, exponentiation, is
fine, but at this level we have *two* inverse functions, because
exponentiation is not commutative. We get logs and roots,
and neither one is closed under the rationals.
But it is entirely possible to restrict ourselves to functions
that we know are total, or functions restricted to defined
inputs. A simple example:
"operating range" is 100% of their inputs. Partial functions are those
for which there is some sort of issue or another with some sets
of inputs. Current terminology doesn't distinguish between two
important different kinds of failures: undefined results and results
whose computational cost is infinite. This may sound ridiculous
on the surface, but I believe if nothing else it is important to
consider
the two cases separately, because they behave differently in
current hardware. Consider that these broken functions are
broken in very different ways:
int x, y;
return x/y;
However, this code will always work, independent of
the values of x and y:
int x, y;
if (y != 0)
else
return 0;
Most compilers aren't smart enough to know that the first return cannot fault, but there are a few research compilers that can figure this out, and I think it's just a matter of time until this capability becomes more commonplace.
There is also resource usage to consider. Given a finite machine and a given program, how can we be sure it is free of stack overflows, or out of memory errors? It turns out to be possible to know this with an effects system. The principle is simple: for every effect one wishes to consider (maximum memory usage, nontermination, etc.) one needs to have an algebra for every kind of node in the language's abstract syntax that can propagate the state of this effect based on its child nodes.
This *necessarily* comes at the cost of reduced expressiveness. There are perfectly correct programs that cannot be proven to be correct under any system. However the expressiveness of provably correct programs is very high, and for the right application the cost in expressiveness and proof obligations is worth paying. Medical devices and avionics are often mentioned at this point.
For example, recursion makes many of these calculations harder, so there are systems in which it is simply not allowed. You pay a cost in expressiveness, and you get a benefit: without recursion, termination analysis is much easier. If you eliminate recursion and general looping, termination is guaranteed; no analysis is necessary.
The vehicle within the compiler that does this work is the type system. In a very real sense, when the compiler issues a type error, it means that it was unable to construct a requested correctness proof. This is not a metaphor but a literal correspondence.
http://en.wikipedia.org/wiki/Curry_Howard
The type systems of most modern languages are on the lame side. You can't prove much with the Java type system past the absence of type errors. But with JML, you can annotate Java code with nontrivial property requirements. And there are other systems such as Epigram, where the type system is the Calculus of Inductive Constructions. Coq is a theorem prover also using CIC; Epigram is a general purpose programming language. Those are both research projects, but SPARK/Ada is a commercial system that has similar goals:
http://www.praxis-his.com/sparkada/
There are hard limits to these sorts of approaches. We can prove our software conforms to our specs, but we cannot prove that our specs are correct for our application. We can prove that our software will work within the confines of our idealized machine, but we can't prove the machine won't lose power, or have a chip failure, or whatever. Nonetheless it seems the worst problems in software are the results of exactly the kinds of things that these proof techniques can fix: buffer overruns, security model failures, crashing on novel inputs, etc.
There is also an entire can of worms relating to communication and concurrency. We face a host of failures there: deadlock, race conditions, etc. There exist formal calculi for proving the absence of these failures as well: Pi calculus, flow calculus, CSP, etc.
Apologies for the rambling nature of this post; hope it was interesting anyway.
Marshall Received on Sat Feb 03 2007 - 20:26:24 CET