(Not logged on) | Log On
View  History

  

Chaining Mappings, Filters, and Bounds

7/28/2008 8:28 PM
You can subscribe to this wiki article using an RSS feed reader.

[Up to jsr166y.forkjoin Examples]

A mapping is an operation which operates on one or two elements and returns a new element. A filter is a boolean predicate which indicates whether a particular element is included in the result. A bounds operation selects an index based subarray of the input.

The ParallelArray API is designed so that these operations can be chained together. That is, the methods for applying these operations are not void methods but rather return objects which may be further processed. Thus, you can apply multiple mappings; the output of applying one mapping may be used as the input to the next mapping. Similarly, the output of one filter can feed into the next filter, and the output of a bounds can feed the next bounds.

Starting with a ParallelArray:For example:

ParallelDoubleArray intermediate, result;
intermediate = ParallelDoubleArray.create(size, ParallelDoubleArray.defaultExecutor())
        .replaceWithGeneratedValue(myGenerator)
        .withMapping(myMapping)
        .withMapping(myOtherMapping)
        .all();
result = intermediate.withFilter(myFilter)
        .withMapping(myLastMapping)
        .all();

This example starts with with a ParallelDoubleArray which is filled with values emitted by the myGenerator instance. The mapping myMapping then alters the contents; a second mapping takes the output of the first mapping as input and outputs a new double value. The result is then stored in the intermediate ParallelDoubleArray. To compute the result, the intermediate results are filtered so that only those values for which myPredicate.op(double) returns true are included, and then those values are passed through the final mapping, myLastMapping.

The following is not (yet) allowed by the API because one cannot apply a filter to the output of a mapping:

ParallelDoubleArray result;
reslult = ParallelDoubleArray.create(size, ParallelDoubleArray.defaultExecutor)
        .replaceWithGeneratedValue(myGenerator)
        .withMapping(myMapping)
        .withMapping(myOtherMapping)
        .withFilter(myFilter) // Note: this filter operation is not allowed
        .withMapping(myLastMapping)
        .all();

Doug Lea explains this restriction:
The current internal support is set up to allow at most one (parallel) selection per expression, which I think is a good rule because it provides an understandable (and efficient) performance model -- it doesn't secretly require multiple temporary workspaces or multiple per-element evaluations of mapping functions. Bear in mind that fast parallel filtering is inherently a two-phase operation (sometimes with internal pipelined overlap): Find indices of all matching elements, and then when you know how many there are, create and put them in new array. (That's one downside of arrays as opposed to other data structures -- you need to know size before you can create.)
Note that the above case differs in that the all() call must evaluate mappings, and then put elements somewhere to collect into returned array in case they pass filter. It is possible to either do slower filtering to avoid multiple calls to mapping functions that would otherwise be needed in this case, or to create extra internal workspace to hold evaluated elements. But in keeping with desire for simple performance model, since there are two options I cruelly make you be explicit about it. To get the evaluate-and-store version, use all(). To do the slower-filter version, rewrite the filter so that it internally does the mapping needed to decide whether to keep the element.

There might be a good always-acceptable approach for this case though, which would make for a better argument for allowing it. I'll explore the possibilities.

Further complicating this story is that if all you want is a reduction, then these concerns don't even apply, and we might as well support it, but doing this leads to an even less regular API, so it is currently out just on those grounds.
and Tim Peierls adds
It is by design. The return types of the methods restrict you to chains that can be performed efficiently in one recursive fork-join "pass". The calls to specify bounds, filters, or mappings just set things up; the real work starts when you call an "action" method like all, reduce, replace, etc.

But in your example, since you aren't keeping the original array, you could avoid the intermediate allocation by using replaceWithMapping. In other words, convert this code:

  pa.withBounds(b).withMapping(m1).all()  // allocates a new PA
    .withFilter(f).withMapping(m2).all(); // allocates another new PA

into

  pa.withBounds(b).replaceWithMapping(m1) // no new allocation
    .withFilter(f).withMapping(m2).all(); // allocates a new PA

The latter allocates only one array besides pa, where the former allocates two additional.
--  David Biesack