Parallel Exception Handling

2013-02-22

Updated: 2014-11-15

An issue I haven't touched on when executing work units in parallel - for example, in a work set, is how to handle exceptions.

Let's say we have a work set and execute two work units in it. Both fail with exceptions:

WorkSet<Object> ws = new WorkSet<Object> ();
ws.execute (new Callable<Object> () {
    public Object call () throws Exception {
        throw new IllegalStateException ();
    }
});

ws.execute (new Callable<Object> () {
    public Object call () throws Exception {
        throw new NullPointerException ();
    }
});

At a later point in the program, we join() the work set - that is, we use the calling thread to help execute any remaining work units and get the results:

ws.join ();

Since the work set has units that failed with exceptions, we can expect join () to signal this somehow. As it is, it will do so by throwing the exception thrown by the first work unit that ended with an exception, in this case, an IllegalStateException. The NullPointerException is lost.

1. Why?

The simple answer: Because it's easier that way.

The longer answer is that as long as exceptions are:

Exceptional: They are very rare.
Fatal: There is no reasonable way to recover from them.
Not user-actionable: There is nothing a user can do with the exception information either.

...we're better treating the whole work set as a single method call and accept the first exceptional state encountered as the result. The rationale for this is as follows:

There is no way we can limit the work set to stop processing after the first exception. Since the whole point of a parallel work set is to run work units in parallel, we will always risk having two work units throwing in parallel. We can therefore view the case of two exceptions as the base case to solve.

One way of handling this is to define a "result" class, and use it to return multiple results that each can have an exception:

public class Result
    <ResultType,ExceptionType extends Throwable> {
    
    public final ResultType result;
    public final ExceptionType exception;
    ...
}

The client code would look something like this:

List<Result<Object,Exception>> results = workSet.join ();
for (Result<Object,Exception> r : results) {
    if (r.hasException ()) {
        // Handle the exception
    } else {
        // Aggregate the result
    }
}

Pretty soon, however, the result inspection code becomes very tedious and very difficult to write. Since we often use work sets for CPU-intensive tasks, our exceptions are exceptional and the code is rarely executed[1]. When it is executed, there is no way to handle the exceptions: if we could, we would have done it right in the work units. Thus, the code that handles the exception usually ends up balking:

List<Result<Object,Exception>> results = workSet.join ();
for (Result<Object,Exception> r : results) {
    if (r.hasException ()) {
        throw new Exception ("Something broke.");
    }
}

Some ambitious few define "multi-exceptions" that can wrap more than one cause:

List<Result<Object,Exception>> results = workSet.join ();
List<Exception> exceptions = ...;
for (Result<Object,Exception> r : results) {
    if (r.hasException ()) {
        exceptions.add (e.exception);
    } else {
        ...
    }
}

if (!exceptions.isEmpty ()) {
    throw new MultiException (exceptions);
}

Unfortunately, the caller receiving the MultiException is just as clueless as to what to do with it. Which causes should be acted upon? The first? The most severe? Which combinations? What if one of the causes is another MultiException? How far should we unwrap the tree of causes? The end result of all this is that we write a lot of code that does very little. It does very little because it violates a principle of exceptions: That they be specific.

The truth is that we usually just want to know if the work set completed. If it did, we're all right. If not, we just want to bail out and signal something. But why should we use the first exception for that? Think about a single-threaded method call, which is what we're trying to approximate with a work set. In such a call, many things can go wrong, but we only ever find out about the first one that failed hard:

private long last = -1;

public void doStuff () throws Exception {
    long now = System.currentTimeMillis ();
    try {
        now = AtomicTimer.currentTimeMillis ();
    } catch (AtomicTimerException ate) {
        // Just use system time for this call
    }
    
    ...
    
    if (now < last) {
        throw new IllegalStateException (
            "Clock cannot run backwards");
    }
    
    ...

    if (now < 0) {
        throw new IllegalStateException (
            "Millisecond time cannot be negative");
    }
    
    last = now;
}

In the method above, we catch and recover from an AtomicTimerException. But we don't signal that. What we do signal is if the mismatch between the system and the atomic timer result in time "running backwards". We also don't check or signal whether the value of now is valid.

Exceptions are, and always will be, the first event that caused abrupt termination, even if there are many such. Having the work set use the first encountered exception in the same way works great in practice.

2. Java 7 Update

Java 7 added the Throwable.addSuppressed[a](Throwable) and Throwable.getSuppressed()[b] methods, which let you "throw exceptions without really throwing them". Throwing the first exception, with the remaining attached as suppressed is a great idea:

Throwable throwable = null;
for (Future<T> f : ...) {
    try {
        res.add (f.get ());
    } catch (Throwable t) {
        // Unwrap ExecutionExceptions because
        // we really want to know what caused
        // them.
        if (t instanceof ExecutionException) {
            t = t.getCause ();
        }
        if (throwable == null) {
            throwable = t;
        } else {
            throwable.addSuppressed (t);
        }
    }
}
if (throwable != null) {
    throw throwable;
}

2013-02-22, updated 2014-11-15

Midsummer Madness by Akay and Spade

Kerbal Space Program

#Java, #multicore, #tech

Parallel Exception Handling

​1. Why?

​2. Java 7 Update

1. Why?

2. Java 7 Update