|
|
[Up to jsr166y.forkjoin Examples]
This is another matrix multiplication example, following on from this article, using ParallelArrays both for the outer column loop and for the inner product loops. It is much, much slower than the version that just parallelizes the outer loop.
void parallelMatrixMultiply(final double[][] a,
final double[][] b,
final double[][] c) {
final int m = a.length;
final int n = b.length;
if (m == 0 || n == 0) return;
final int p = b[0].length;
final ForkJoinExecutor pool = ParallelArray.defaultExecutor();
final BinaryDoubleOp product = new BinaryDoubleOp() {
public double op(double a, double b) { return a * b; }
};
ParallelDoubleArray
.createUsingHandoff(b[0], pool)
.withIndexedFilter(new IntAndDoublePredicate() {
public boolean op(final int j, double ignored) {
ParallelDoubleArray Bcolj = ParallelDoubleArray.create(n, pool);
Bcolj.replaceWithMappedIndex(new IntToDouble() {
public double op(int k) { return b[k][j]; }
});
for (int i = 0; i < m; i++) {
c[i][j] = ParallelDoubleArray
.createUsingHandoff(a[i], pool)
.withMapping(product, Bcolj).sum();
}
return false;
}
}).apply(new DoubleProcedure() {
public void op(double ignored) {}
});
}
It also uses a slightly different trick to avoid creating an array for the outer column loop, and it loads the transposed Bcolj in parallel.
The only way I could see this approach being practical is when the number of processors greatly exceeds the number of columns in the result.
--Tim Peierls < tim at peierls dot net >