Re: Failure Modes and Ranges of Operation

From: Bob Badour <bbadour_at_pei.sympatico.ca>
Date: Sat, 03 Feb 2007 20:43:11 GMT
Message-ID: <zt6xh.2016$R71.29193_at_ursa-nb00s0.nbnet.nb.ca>


Marshall wrote:

> On Feb 3, 7:26 am, Bob Badour <bbad..._at_pei.sympatico.ca> wrote:
>

>>Neo's recent troll started me down a meandering path leading to these
>>two separate but somewhat related (engineering) concerns.
>>
>>An operating range describes the conditions under which one of our
>>devices will operate without failure. Failure modes are what happens
>>when one tries to operate a device beyond that range.
>>
>>The sci.logic sort of folks grapple with the problems one encounters
>>when one tries to have an infinite range of operation. And yet one can
>>never fully escape the failure mode problem because division by zero
>>generally fails.
>>
>>Failure mode analysis is very important in engineering. I wonder whether
>>it has any supporting theory? Certainly, one can think of general
>>principles. Likewise, beyond statistical analysis and empirical
>>measurement, does any theory exist regarding ranges of operation?
>>
>>Any thoughts?

>
>
> One sound theoretical aspect of the issue you're mentioning is the
> totality/partiality of functions. "Total functions" are functions
> whose
> "operating range" is 100% of their inputs. Partial functions are those
> for which there is some sort of issue or another with some sets
> of inputs. Current terminology doesn't distinguish between two
> important different kinds of failures: undefined results and results
> whose computational cost is infinite. This may sound ridiculous
> on the surface, but I believe if nothing else it is important to
> consider
> the two cases separately, because they behave differently in
> current hardware. Consider that these broken functions are
> broken in very different ways:
>
> f(x) = x/0
>
> f(x) = f(f(x))
>
> (I'll go out on a limb here and say that we may someday
> be able to do something with the second kind, the kind
> that require infinite resources to compute. Consider that
> wild speculation though.)
>
> In general, we find that "forward" computations, those
> that are built out of successive lower level computations,
> do not cause us problems. Given the ordered naturals,
> we can speak of the successor function; given successor,
> we can construct addition, which gives rise to multiplication,
> which gives rise to exponentiation, which given rise to
> tetration, ad infinitum. Each of these functions is total.

Successor is only total if one has an infinite set. For a finite computational model, successor fails at some point, and the question then becomes: How does it fail? Likewise for addition, multiplication, exponentiation, factorial etc.

> Each successive one has an inverse, and it seems the
> inverses are a significant source of problems.

It's not only the inverses that cause problems for those of us with finite resources.

> The predecessor function is defined on every natural number
> except 0. Oops! Subtraction is only defined on every pair of
> natural numbers where the minuend is >= the subtrahend.

For finite sets, the successor function is defined on every natural number except the largest one can represent. Addition is only defined on every pair of natural numbers where the sum is less than or equal to the largest representable natural.

> The integers solve this issue, and make subtraction total.
> Multiplication is total even when we extend it to the integers,
> but most pairs of integers divided don't make an integer.
> So we get the rationals, but we still have a problem at zero.
> At the next level, the forward function, exponentiation, is
> fine, but at this level we have *two* inverse functions, because
> exponentiation is not commutative. We get logs and roots,
> and neither one is closed under the rationals.
>
> But it is entirely possible to restrict ourselves to functions
> that we know are total, or functions restricted to defined
> inputs.

Assuming we have infinite resources.

  A simple example:
>
> In this code, I don't know for sure that it will work until
> runtime, because I don't know that y != 0:
>
> int x, y;
> return x/y;
>
> However, this code will always work, independent of
> the values of x and y:
>
> int x, y;
> if (y != 0)
> return x/y;
> else
> return 0;

And in a way, this is the point that failure mode analysis begins. What will happen when it returns zero instead of some super huge number?

> Most compilers aren't smart enough to know that the
> first return cannot fault, but there are a few research
> compilers that can figure this out, and I think it's just
> a matter of time until this capability becomes more
> commonplace.

I suspect the automated provers already handle this sort of thing. After all, anyone would object to dividing both sides of an equation by x without simultaneously restricting the domain to exclude x=0. I think it would be extremely useful to have a proof checker that catches such things.

> There is also resource usage to consider. Given a
> finite machine and a given program, how can we
> be sure it is free of stack overflows, or out of memory
> errors? It turns out to be possible to know this with
> an effects system. The principle is simple: for every
> effect one wishes to consider (maximum memory usage,
> nontermination, etc.) one needs to have an algebra
> for every kind of node in the language's abstract syntax
> that can propagate the state of this effect based on
> its child nodes.

Nontermination? Isn't that one a stretch? Or are you suggesting the language restricts one to algorithms for which one can determine halting?

> This *necessarily* comes at the cost of reduced
> expressiveness. There are perfectly correct programs
> that cannot be proven to be correct under any system.
> However the expressiveness of provably correct
> programs is very high, and for the right application the
> cost in expressiveness and proof obligations is worth
> paying. Medical devices and avionics are often mentioned
> at this point.
>
> For example, recursion makes many of these calculations
> harder, so there are systems in which it is simply not
> allowed. You pay a cost in expressiveness, and you get
> a benefit: without recursion, termination analysis is much
> easier.

Is it? It seems to me a turing machine doesn't exactly support recursion and didn't somebody once prove one cannot prove whether every algorithm halts using one?

  If you eliminate recursion and general looping,
> termination is guaranteed; no analysis is necessary.

Agreed.

> Guaranteeing that resource usage falls within limits
> can be done by an effect that establishes an
> upper bound on resource usage per abstract syntax
> node.
>
> The vehicle within the compiler that does this work
> is the type system. In a very real sense, when the
> compiler issues a type error, it means that it was
> unable to construct a requested correctness proof.
> This is not a metaphor but a literal correspondence.
>
> http://en.wikipedia.org/wiki/Curry_Howard
>
> The type systems of most modern languages are
> on the lame side. You can't prove much with the
> Java type system past the absence of type errors.
> But with JML, you can annotate Java code with
> nontrivial property requirements. And there are
> other systems such as Epigram, where the type
> system is the Calculus of Inductive Constructions.
> Coq is a theorem prover also using CIC; Epigram
> is a general purpose programming language. Those
> are both research projects, but SPARK/Ada is a
> commercial system that has similar goals:
>
> http://www.praxis-his.com/sparkada/
>
> There are hard limits to these sorts of approaches.
> We can prove our software conforms to our specs,
> but we cannot prove that our specs are correct
> for our application. We can prove that our software
> will work within the confines of our idealized machine,
> but we can't prove the machine won't lose power,
> or have a chip failure, or whatever. Nonetheless
> it seems the worst problems in software are the
> results of exactly the kinds of things that these proof
> techniques can fix: buffer overruns, security model
> failures, crashing on novel inputs, etc.
>
> There is also an entire can of worms relating to
> communication and concurrency. We face a host
> of failures there: deadlock, race conditions, etc.
> There exist formal calculi for proving the absence
> of these failures as well: Pi calculus, flow calculus,
> CSP, etc.
>
> As an aside, I'll also mention Erlang, which, while
> not on the same formal basis as the other stuff
> I've mentioned, nonetheless has an excellent
> reliability record on the basis of an unusual error
> handling philosophy called "Let it Crash." The basic
> idea being that the consequence of any kind of failure,
> divide by zero, abrupt socket closure, etc. is dealt
> with by the death of the thread in which the fault
> occurred. No try/catch here! Instead there is a clean
> system of supervisor processes which monitor and
> detect thread faults and respond according to policy.
> In practice this works exceedingly well; Erlang software
> has some of the highest uptime in the industry.

Uptime is one thing. But what happens to a physical device under the control of a thread that just disappears?

> Apologies for the rambling nature of this post; hope
> it was interesting anyway.

Very. Received on Sat Feb 03 2007 - 21:43:11 CET

Original text of this message