|
|
The following descriptions were taken from discussions on the concurrency-interest mailing list.
jsr166y adds a lightweight task framework known as ForkJoin (FJ) to the Java platform. It is targeted for Java 7. ForkJoin was originally developed by Doug Lea; see Fork/Join Framework.
Parallel*Array (often referred to as PA) and its planned follow-ons for sets and maps, provide an easier/better way of routinely programming to take advantage of dozens to hundreds of processors/cores: If you can think about a programming problem in terms of aggregate operations on collections of elements, then we can automate parallel execution. This generally pays off if either you have lots of elements, (in which case, it works well even if the operations are small/cheap), or if each of the operations are time consuming (in which case it works well even if there are not a lot of elements). To take advantage of this though, the aggregate processing must have a regular structure, which means that you must be able to express things in terms of apply, reduce, filter, map, cumulate, sort, uniquify, paired mappings, and so on. So that's what Parallel*Array does. Some people (especially those with experience with either database programming or functional programming) like this style anyway, and might use this API even without the parallelism benefits. But in turn, the fact that this style can be awkward to write out has led to all the language extension controversies about closures, function types, and related syntactic support.
Like all lightweight-task frameworks, ForkJoin (FJ) does not explicitly cope with blocked IO: If a worker thread blocks on IO, then (1) it is not available to help process other tasks (2) Other worker threads waiting for the task to complete (i.e., to join it) may run out of work and waste CPU cycles. Neither of these issues completely eliminates potential use, but they do require a lot of explicit care. For example, you could place tasks that may experience sustained blockages in their own small ForkJoinPools. (The Fortress folks do something along these lines mapping fortress "fair" threads onto forkjoin.) You can also dynamically increase worker pool sizes (we have methods to do this) when blockages may occur. All in all though, the reason for the restrictions and advice are that we do not have good automated support for these kinds of cases, and don't yet know of the best practices, or whether it is a good idea at all.
The main FJtechniques used in PA split problems into many more tasks than you have processors, and rely on the underlying mechanics to load-balance etc. So in this sense, worker threads are always "doing other things". And normally, this works about equally well regardless of nesting. (The "about" here is shorthand for a long story with little practical impact in most cases.)
However, it is often the case that you can replace nested parallelism with more efficient non-nested algorithms. (Matrix multiply happens to be a fairly well known example of this; see the old FJTask version) To implement this here, you would need to move from PA-level to FJ-level programming, which I think might be a natural progression for people trying to fine-tune performance. PA does most common things more efficiently than most people could do themselves using ForkJoinTasks, and does so without forcing them to learn how to do parallel recursive decomposition. For those things PA does not do, we provide the underlying FJ tools. (OK, so they are underdocumented tools at the moment, but hopefully that will improve.)
As someone who has written a lot of fork-join programs over the past decade, I think they are fun and natural to program. But even I prefer using PA for common apply-to-all constructions and the like rather than manually setting up little recursive tasks to do it.
--Doug Lea