In Java 8, the Collection interface was extended with two methods that return Stream<E>: stream(), which returns a sequential stream, and parallelStream(), which returns a possibly-parallel stream. Stream itself also has a parallel() method that returns an equivalent parallel stream (either mutating the current stream to be parallel or creating a new stream).
The duplication has obvious disadvantages:
It's confusing. A question asks whether calling both parallelStream().parallel() is necessary to be sure the stream is parallel, given that parallelStream() may return a sequential stream. Why does parallelStream() exist if it can't make a guarantee? The other way around is also confusing -- if parallelStream() returns a sequential stream, there's probably a reason (e.g., an inherently sequential data structure for which parallel streams are a performance trap); what should Stream.parallel() do for such a stream? (UnsupportedOperationException is not allowed by parallel()'s specification.)
Adding methods to an interface risks conflicts if an existing implementation has a similarly-named method with an incompatible return type. Adding parallelStream() in addition to stream() doubles the risk for little gain. (Note that parallelStream() was at one point just named parallel(), though I don't know if it was renamed to avoid name clashes or for another reason.)
Why does Collection.parallelStream() exist when calling Collection.stream().parallel() does the same thing?
The Javadocs for Collection.(parallelS|s)tream() and Stream itself don't answer the question, so it's off to the mailing lists for the rationale. I went through the lambda-libs-spec-observers archives and found one thread specifically about Collection.parallelStream() and another thread that touched on whether java.util.Arrays should provide parallelStream() to match (or actually, whether it should be removed). There was no once-and-for-all conclusion, so perhaps I've missed something from another list or the matter was settled in private discussion. (Perhaps Brian Goetz, one of the principals of this discussion, can fill in anything missing.)
The participants made their points well, so this answer is mostly just an organization of the relevant quotes, with a few clarifications in [brackets], presented in order of importance (as I interpret it).
parallelStream() covers a very common case
Brian Goetz in the first thread, explaining why Collections.parallelStream() is valuable enough to keep even after other parallel stream factory methods have been removed:
We do not have explicit parallel versions of each of these [stream factories]; we did
originally, and to prune down the API surface area, we cut them on the
theory that dropping 20+ methods from the API was worth the tradeoff of
the surface yuckiness and performance cost of .intRange(...).parallel().
But we did not make that choice with Collection.
We could either remove the Collection.parallelStream(), or we could add
the parallel versions of all the generators, or we could do nothing and
leave it as is. I think all are justifiable on API design grounds.
I kind of like the status quo, despite its inconsistency. Instead of
having 2N stream construction methods, we have N+1 -- but that extra 1
covers a huge number of cases, because it is inherited by every
Collection. So I can justify to myself why having that extra 1 method
is worth it, and why accepting the inconsistency of going no further is
acceptable.
Do others disagree? Is N+1 [Collections.parallelStream() only] the practical choice here? Or should we go
for the purity of N [rely on Stream.parallel()]? Or the convenience and consistency of 2N [parallel versions of all factories]? Or is
there some even better N+3 [Collections.parallelStream() plus other special cases], for some other specially chosen cases we
want to give special support to?
Brian Goetz stands by this position in the later discussion about Arrays.parallelStream():
I still really like Collection.parallelStream; it has huge
discoverability advantages, and offers a pretty big return on API
surface area -- one more method, but provides value in a lot of places,
since Collection will be a really common case of a stream source.
parallelStream() is more performant
Brian Goetz:
Direct version [parallelStream()] is more performant, in that it requires less wrapping (to
turn a stream into a parallel stream, you have to first create the
sequential stream, then transfer ownership of its state into a new
Stream.)
In response to Kevin Bourrillion's skepticism about whether the effect is significant, Brian again:
Depends how seriously you are counting. Doug counts individual object
creations and virtual invocations on the way to a parallel operation,
because until you start forking, you're on the wrong side of Amdahl's
law -- this is all "serial fraction" that happens before you can fork
any work, which pushes your breakeven threshold further out. So getting
the setup path for parallel ops fast is valuable.
Doug Lea follows up, but hedges his position:
People dealing with parallel library support need some attitude
adjustment about such things. On a soon-to-be-typical machine,
every cycle you waste setting up parallelism costs you say 64 cycles.
You would probably have had a different reaction if it required 64
object creations to start a parallel computation.
That said, I'm always completely supportive of forcing implementors
to work harder for the sake of better APIs, so long as the
APIs do not rule out efficient implementation. So if killing
parallelStream is really important, we'll find some way to
turn stream().parallel() into a bit-flip or somesuch.
Indeed, the later discussion about Arrays.parallelStream() takes notice of lower Stream.parallel() cost.
stream().parallel() statefulness complicates the future
At the time of the discussion, switching a stream from sequential to parallel and back could be interleaved with other stream operations. Brian Goetz, on behalf of Doug Lea, explains why sequential/parallel mode switching may complicate future development of the Java platform:
I'll take my best stab at explaining why: because it (like the stateful
methods (sort, distinct, limit)) which you also don't like, move us
incrementally farther from being able to express stream pipelines in
terms of traditional data-parallel constructs, which further constrains
our ability to to map them directly to tomorrow's computing substrate,
whether that be vector processors, FPGAs, GPUs, or whatever we cook up.
Filter-map-reduce map[s] very cleanly to all sorts of parallel computing
substrates; filter-parallel-map-sequential-sorted-limit-parallel-map-uniq-reduce
does not.
So the whole API design here embodies many tensions between making it
easy to express things the user is likely to want to express, and doing
is in a manner that we can predictably make fast with transparent cost
models.
This mode switching was removed after further discussion. In the current version of the library, a stream pipeline is either sequential or parallel; last call to sequential()/parallel() wins. Besides side-stepping the statefulness problem, this change also improved the performance of using parallel() to set up a parallel pipeline from a sequential stream factory.
exposing parallelStream() as a first-class citizen improves programmer perception of the library, leading them to write better code
Brian Goetz again, in response to Tim Peierls's argument that Stream.parallel() allows programmers to understand streams sequentially before going parallel:
I have a slightly different viewpoint about the value of this sequential
intuition -- I view the pervasive "sequential expectation" as one if the
biggest challenges of this entire effort; people are constantly
bringing their incorrect sequential bias, which leads them to do stupid
things like using a one-element array as a way to "trick" the "stupid"
compiler into letting them capture a mutable local, or using lambdas as
arguments to map that mutate state that will be used during the
computation (in a non-thread-safe way), and then, when its pointed out
that what they're doing, shrug it off and say "yeah, but I'm not doing
it in parallel."
We've made a lot of design tradeoffs to merge sequential and parallel
streams. The result, I believe, is a clean one and will add to the
library's chances of still being useful in 10+ years, but I don't
particularly like the idea of encouraging people to think this is a
sequential library with some parallel bags nailed on the side.
Related
So I am pretty new to Haskell and would like to know, if synchronisation is used to prevent corruption when multithreading Java, how is this done in Haskell? I've only found useless or overly complicated responses on google.
Your question is a bit ambiguous since one may use multithreading for either concurrency or parallelism, which are distinct problems with distinct solutions.
In both cases, you'll need to make sure your programs are compiled with SMP support and ran using multiple RTS threads: see the GHC manual's section about concurrency.
Concurrency
As others have pointed out, synchronization will be a non problem in the vast majority of your code, since you'll mostly be dealing with pure functions. This is true in any language if you keep mutable state and libraries that rely on it under armed guard religiously avoid mutable state unless it is properly wrapped behind a pure API. Concurrency is an area where Haskell shines because its semantics require purity. Types are used to describe impure operations instead, making it dead easy to spot code where some sort of synchronization might be needed.
Typically, your application's state will be backed by a transactional database which will handle synchronization and persistence for you. You will not need any additional synchronization at all if your concurrent application does not have additional state.
In other cases, haskell has a handy Software Transactional Memory implementation. It allows you to write and compose code written in an imperative-looking style, without explicit locking, while having atomicity and guarantees against deadlocks. It is the foolproof(tm) way to write concurrent code.
Lastly, there are some low-level primitives available in base: plain old mutable references with IORef, semaphores, and MVars which can be used as if they were variables protected by a mutex.
There also are channels in base, but beware: they are unbounded !
Parallelism
This is also an area where Haskell shines because of its non-strict semantics. Non-strictness allows you to write code that expresses your logic in a straightforward manner while not getting committed to a specific evaluation order.
As a consequence, you can describe a parallel evaluation strategy separately from the business logic. Writing parallel code is then just a matter of placing the right annotation in the right spot.
Here is an example that was/is used in production at Bdellium:
map outputParticipant parts `using` parListChunk 10 rdeepseq
^^^^^ business logic ^^^^^^ ^^^^ eval. strategy ^^^^
The code can be understood as follows: Parallel workers will fully evaluate the results of mapping the outputParticipant function to individual items in the parts list, distributing the work in chunks of 10 elements.
This answer will pertain to functional languages in general - no synchronisation are needed. As functions in functional programming have no side effects: functions accept a value and return a value, there's no mutable state. Such functions are inherently thread safe.
In the last few days I made some test with external iteration, streams and parallelStreams in Java 8 and measured the duration of the execution time. I also read about the warm up time which I have to consider. But one question still remains.
The first time when I call the method stream() or parallelStream() on a collection the execution time is higher than it is for an external iteration. I already know, that when I call the stream() or parallelStream() more often on the same collection and avarage the execution time, then the parallelStream() is indeed faster than the external iteration. But since in practice a collection is also often only iterate once, I only see an disadvantage in using streams or parallelstreams.
So my question is:
If I only iterate an collection once, is it a good idea to use stream or parallelStream() or will the execution time always be higher than for external iteration?
Entirely coincidentally (apparently), Doug Lea, Brian Goetz, and several other folks have written a document called Stream Parallel Guidance. (This is only a draft.) It does have some useful discussion about when to use parallel vs. sequential streams.
A brief summary: a parallel stream is more expensive to start up than a sequential stream. If your workload is splittable, and you have multiple CPU cores that can be brought to bear on the problem, and if the per-element cost isn't unreasonably small, you'll get a parallel speedup with a sufficiently large workload. (How's that for a lot of conditionals?) Oh, and you also have to be careful about benchmarking.
StackOverflow is littered with questions that attempt to add up a few integers in parallel and then claim that parallel streams are no good because they don't provide any speedup. I won't even bother linking to them.
Now, you had asked about "external iteration" (basically a for-loop) vs streams, parallel or sequential. I think it's important consider parallel vs sequential streams, as I've done above. This will help inform further decisions. Clearly, if there is a possibility you'll need to run things in parallel, then you should probably go with streams, even if you initially start sequentially.
Even if you don't intend to go parallel, there are still a number of considerations between for-loops and sequential streams. There is a certain amount of overhead of streams compared to conventional loops -- especially for-loops over an array. But this is usually amortized over the workload. Even if the collection is iterated only once, amortization of the setup can occur if the number of elements in the collection is sufficiently large. For example, if the collection has 10 elements, the extra setup cost of a stream probably isn't worth it. If the collection has 10,000 elements, it might be a different story.
For-loops over arrays are particularly fast because the only "setup" is initializing loop counters and limit values in registers. JIT compilers can bring many loop optimizations to bear as well. It's rare for sequential streams to beat a for-loop over an array, though it can happen.
For-loops over collections usually involve creating an iterator and thus have somewhat more overhead than array-based loops. In particular, each iteration on an iterator involves method calls to hasNext and next whereas a stream can get each element with a single method call. For this reason there are times a sequential stream can beat a iterator-based loop (given the right per-element workload, a sufficiently large number of elements, etc.). So even though there is some setup cost for a stream, there is also the possibility that it might end up running faster than a conventional for-loop.
Finally, performance isn't the only consideration. There is also readability and maintainability. The streams and lambda stuff may initially be new and unfamiliar, but it has great potential to simplify and clean up code. See my answer to another question, for example.
In Java 8, the Collection interface was extended with two methods that return Stream<E>: stream(), which returns a sequential stream, and parallelStream(), which returns a possibly-parallel stream. Stream itself also has a parallel() method that returns an equivalent parallel stream (either mutating the current stream to be parallel or creating a new stream).
The duplication has obvious disadvantages:
It's confusing. A question asks whether calling both parallelStream().parallel() is necessary to be sure the stream is parallel, given that parallelStream() may return a sequential stream. Why does parallelStream() exist if it can't make a guarantee? The other way around is also confusing -- if parallelStream() returns a sequential stream, there's probably a reason (e.g., an inherently sequential data structure for which parallel streams are a performance trap); what should Stream.parallel() do for such a stream? (UnsupportedOperationException is not allowed by parallel()'s specification.)
Adding methods to an interface risks conflicts if an existing implementation has a similarly-named method with an incompatible return type. Adding parallelStream() in addition to stream() doubles the risk for little gain. (Note that parallelStream() was at one point just named parallel(), though I don't know if it was renamed to avoid name clashes or for another reason.)
Why does Collection.parallelStream() exist when calling Collection.stream().parallel() does the same thing?
The Javadocs for Collection.(parallelS|s)tream() and Stream itself don't answer the question, so it's off to the mailing lists for the rationale. I went through the lambda-libs-spec-observers archives and found one thread specifically about Collection.parallelStream() and another thread that touched on whether java.util.Arrays should provide parallelStream() to match (or actually, whether it should be removed). There was no once-and-for-all conclusion, so perhaps I've missed something from another list or the matter was settled in private discussion. (Perhaps Brian Goetz, one of the principals of this discussion, can fill in anything missing.)
The participants made their points well, so this answer is mostly just an organization of the relevant quotes, with a few clarifications in [brackets], presented in order of importance (as I interpret it).
parallelStream() covers a very common case
Brian Goetz in the first thread, explaining why Collections.parallelStream() is valuable enough to keep even after other parallel stream factory methods have been removed:
We do not have explicit parallel versions of each of these [stream factories]; we did
originally, and to prune down the API surface area, we cut them on the
theory that dropping 20+ methods from the API was worth the tradeoff of
the surface yuckiness and performance cost of .intRange(...).parallel().
But we did not make that choice with Collection.
We could either remove the Collection.parallelStream(), or we could add
the parallel versions of all the generators, or we could do nothing and
leave it as is. I think all are justifiable on API design grounds.
I kind of like the status quo, despite its inconsistency. Instead of
having 2N stream construction methods, we have N+1 -- but that extra 1
covers a huge number of cases, because it is inherited by every
Collection. So I can justify to myself why having that extra 1 method
is worth it, and why accepting the inconsistency of going no further is
acceptable.
Do others disagree? Is N+1 [Collections.parallelStream() only] the practical choice here? Or should we go
for the purity of N [rely on Stream.parallel()]? Or the convenience and consistency of 2N [parallel versions of all factories]? Or is
there some even better N+3 [Collections.parallelStream() plus other special cases], for some other specially chosen cases we
want to give special support to?
Brian Goetz stands by this position in the later discussion about Arrays.parallelStream():
I still really like Collection.parallelStream; it has huge
discoverability advantages, and offers a pretty big return on API
surface area -- one more method, but provides value in a lot of places,
since Collection will be a really common case of a stream source.
parallelStream() is more performant
Brian Goetz:
Direct version [parallelStream()] is more performant, in that it requires less wrapping (to
turn a stream into a parallel stream, you have to first create the
sequential stream, then transfer ownership of its state into a new
Stream.)
In response to Kevin Bourrillion's skepticism about whether the effect is significant, Brian again:
Depends how seriously you are counting. Doug counts individual object
creations and virtual invocations on the way to a parallel operation,
because until you start forking, you're on the wrong side of Amdahl's
law -- this is all "serial fraction" that happens before you can fork
any work, which pushes your breakeven threshold further out. So getting
the setup path for parallel ops fast is valuable.
Doug Lea follows up, but hedges his position:
People dealing with parallel library support need some attitude
adjustment about such things. On a soon-to-be-typical machine,
every cycle you waste setting up parallelism costs you say 64 cycles.
You would probably have had a different reaction if it required 64
object creations to start a parallel computation.
That said, I'm always completely supportive of forcing implementors
to work harder for the sake of better APIs, so long as the
APIs do not rule out efficient implementation. So if killing
parallelStream is really important, we'll find some way to
turn stream().parallel() into a bit-flip or somesuch.
Indeed, the later discussion about Arrays.parallelStream() takes notice of lower Stream.parallel() cost.
stream().parallel() statefulness complicates the future
At the time of the discussion, switching a stream from sequential to parallel and back could be interleaved with other stream operations. Brian Goetz, on behalf of Doug Lea, explains why sequential/parallel mode switching may complicate future development of the Java platform:
I'll take my best stab at explaining why: because it (like the stateful
methods (sort, distinct, limit)) which you also don't like, move us
incrementally farther from being able to express stream pipelines in
terms of traditional data-parallel constructs, which further constrains
our ability to to map them directly to tomorrow's computing substrate,
whether that be vector processors, FPGAs, GPUs, or whatever we cook up.
Filter-map-reduce map[s] very cleanly to all sorts of parallel computing
substrates; filter-parallel-map-sequential-sorted-limit-parallel-map-uniq-reduce
does not.
So the whole API design here embodies many tensions between making it
easy to express things the user is likely to want to express, and doing
is in a manner that we can predictably make fast with transparent cost
models.
This mode switching was removed after further discussion. In the current version of the library, a stream pipeline is either sequential or parallel; last call to sequential()/parallel() wins. Besides side-stepping the statefulness problem, this change also improved the performance of using parallel() to set up a parallel pipeline from a sequential stream factory.
exposing parallelStream() as a first-class citizen improves programmer perception of the library, leading them to write better code
Brian Goetz again, in response to Tim Peierls's argument that Stream.parallel() allows programmers to understand streams sequentially before going parallel:
I have a slightly different viewpoint about the value of this sequential
intuition -- I view the pervasive "sequential expectation" as one if the
biggest challenges of this entire effort; people are constantly
bringing their incorrect sequential bias, which leads them to do stupid
things like using a one-element array as a way to "trick" the "stupid"
compiler into letting them capture a mutable local, or using lambdas as
arguments to map that mutate state that will be used during the
computation (in a non-thread-safe way), and then, when its pointed out
that what they're doing, shrug it off and say "yeah, but I'm not doing
it in parallel."
We've made a lot of design tradeoffs to merge sequential and parallel
streams. The result, I believe, is a clean one and will add to the
library's chances of still being useful in 10+ years, but I don't
particularly like the idea of encouraging people to think this is a
sequential library with some parallel bags nailed on the side.
I have been using Java7's Fork/Join concurrency framework, and it works well. I finally got around to reading the API's javadoc for ForkJoinTask, and it contains this paragraph:
ForkJoinTasks should perform relatively small amounts of computation. Large tasks should be split into smaller subtasks, usually via recursive decomposition. As a very rough rule of thumb, a task should perform more than 100 and less than 10000 basic computational steps, and should avoid indefinite looping. If tasks are too big, then parallelism cannot improve throughput. If too small, then memory and internal task maintenance overhead may overwhelm processing.
I understand all that, except for the bold text. What exactly is a "basic computational step"? Is it a generic computer science term, or is it specific to Java? How does it relate to byte code, lines of source code, source code statements, etc..?
An example might be as useful as a formal definition. Can anyone conjure up a few lines of Java and then break it down into to the associated "basic computational steps"?
A basic step that can be directly evaluated, as opposed to an instruction which requires you to solve 10 other things before you evaluate it. Or, simplest unit of work. I would guess that in literal terms, computational steps refers to simple instructions in java bytecode. As this explains, it's just a general method for describing how big a chunk of work is:
From http://coopsoft.com/ar/CalamityArticle.html
A Java™ Fork-Join Calamity
Do you wonder why > 100, < 10k computational steps?
100 has to do with the work stealing problem. All forked Tasks go into the same deque making other threads search for work. When the threads encounter contention they back off and look somewhere else. Since there is no work anywhere else they try the same deque again, and again, and again until the forking thread finally finishes the work all by itself. You can see the proof by downloading the source code for Class LongSum.java below. Hence, run slow or there will be no parallelism.
10k has to do with the join() problem. Since the F/J framework cannot do pure Task Management (see Faulty Task Manager, above) with Tasks actually waiting independently of threads when they call join(), the framework has to create “continuation threads” to avoid a halt. There can only be a limited time before it all falls apart. Hence, run fast or die.
According to the javadoc, in Phaser class,
Phasers may be tiered (i.e., constructed in tree structures) to reduce contention. Phasers with large numbers of parties that would otherwise experience heavy synchronization contention costs may instead be set up so that groups of sub-phasers share a common parent. This may greatly increase throughput even though it incurs greater per-operation overhead.
Could anybody clarify this statement, It's given me a bit of confusion.
Balanced trees work well in recursive decomposing programs. One example of that is the Fork/Join framework in Java7. I imagine tiered was added to Phasers as another way to use this framework, but at an awful cost. When a Phase must wait for arrival, the framework creates another thread to take its place. For a large number of waiters, this can be a disaster. You can see it work yourself by downloading the example software from this article I wrote two years ago.