I have a problem regarding running multiple threads executing a pipeline in Java.
Say the 'input' of the pipeline is:
5 instructions namely: I1, I2, I3, I4, I5
If I1 has been fetched it will be now ready for decoding but fetch operation will not wait the decode task to finish. After transferring the fetched instructions to decode, the fetch operation will now get the next instruction I2, and so on.
It is a pipelining scheduling with five stages.
How do I simulate such a machine using java multithreading?
Assuming you want to know how to implement such a thing: It is called a 'pipeline pattern'. If this is not homework, you could reuse an existing implementation of this pattern. One is available at:
http://code.google.com/p/pipelinepattern/
If this is homework, then your teacher might be expecting you to write it on your own from scratch. A good starting point is this two stage pipeline (where one thread reads lines from a file, and the other thread prints the lines):
http://rosettacode.org/wiki/Synchronous_concurrency#Java
In the above example the two stages communicate via a BlockingQueue (i.e. stage 1 writes to the queue, and stage 2 reads it). If stage 1 is consistently faster than stage 2, the queue will get quite big). You can enforce operation of the stages in lockstep by using a SynchronousQueue instead [see comment #1 to this answer, for why].
If you need a five stage pipeline, you need to extend this by having 5 threads which have 4 queues between them:
in -> [s1] -> q12 -> [s2] -> q23 -> [s3] -> q34 -> [s4] -> q45 -> [s5] -> out
Above, each [s*] represents a stage (a thread) and each qAB represents a Queue being enqueued to by [sA] and dequeued from by [sB]
Related
i did research but didn't find a adequate answer for this question.
Why we need more stages than on stage.
One Thread -> One Big Task(A,B,C,D)
VS
CompletableFuture with the stages A, B, C, D
So my answer would be the following:
If I have more stages, i can split the task over different methods and classes
If I have more stages, it's more fair executing the whole task related to other whole tasks. What I mean with that? Let's say we have in our system only one Thread. If I execute it that way -> One Big Task(A,B,C,D), then my next big Task (W,X,Y,Z) get the chance to be executed, after the first big task is ready. With CompletionStages, there it is more fair: because A,W,B,C,X,Y,Z,D could be the execution order
Are there for my last point any metrics/rules, how small I should split the big task into sub-tasks?
Is my last point a point for the stages in CompletableFutures?
Is my first point a
point for the stages in CompletableFutures?
Are there other points for using the stages of CompletableFutures?
When you have the choice, like with
CompletableFuture.supplyAsync(() -> method1())
.thenApply(o1 -> method2(o1))
.thenApply(o2 -> method3(o2))
.thenAccept(o3 -> method4(o3));
and
CompletableFuture.runAsync(() -> {
var o1 = method1();
var o2 = method2(o1);
var o3 = method3(o2);
method4(o3);
});
or
CompletableFuture.runAsync(() -> method4(method3(method2(method1()))));
there is no advantage in using multiple stages. In fact, the first variant is much harder to debug than the alternatives.
Things are different when the chaining does not happen at the same place. Think of a library having a future returning method, encapsulating something like supplyAsync(() -> method1()), another library calling that method, chaining another operation and returning the composition to the application which will chain yet another application.
Expressing the same in a single stage would only be possible when the methods invoked in the functions are still provided by each library’s API and have a sequential nature, i.e. we’re not talking about thenCompose(…) kind of stages.
But such chains are still hard to debug and project Loom is trying to solve this. Then, you’d express the operation as a call sequence, exactly like in the second or third variant even when the methods are potentially blocking, but run it in a virtual thread which will release the underlying native thread each time it would block.
Then, we have even less use for a linear chain of stages.
A remaining use case for creating a linear chain of dependent stages is to have different executors. For example
CompletableFuture.supplyAsync(() -> fetchFromDb(), MY_BACKGROUND_EXECUTOR)
.thenAcceptAsync(data -> updateSwingModel(data), EventQueue::invokeLater)
.whenCompleteAsync((x, thrown) ->
updateStatusBar(jobID, thrown), EventQueue::invokeLater);
here, writing the operation as a single block is not an option…
I am still trying to understand the difference between the reactor map() and flatMap() method.
First I took a look at the API, but it isn't really helpful, it confused me even more.
Then I googled a lot, but it seems like nobody has an example to make the differences understandable, if there are any differences.
Therefore I tried to write two tests to see the different behaviour for each methods.
But unfortunatley it isn't working as I hoped it would...
First test method is testing the reactive flatMap() method:
#Test
void fluxFlatMapTest() {
Flux.just(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
.window(2)
.flatMap(fluxOfInts -> fluxOfInts.map(this::processNumber).subscribeOn(Schedulers.parallel()))
.doOnNext(System.out::println)
.subscribe();
}
The output is as expected, explainable and looks like that:
9 - parallel-2
1 - parallel-1
4 - parallel-1
25 - parallel-3
36 - parallel-3
49 - parallel-4
64 - parallel-4
81 - parallel-5
100 - parallel-5
16 - parallel-2
The second method should test the output of the map() method to compare with above results of the flatMap() method.
#Test
void fluxMapTest() {
final int start = 1;
final int stop = 100;
Flux.just(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
.window(2)
.map(fluxOfInts -> fluxOfInts.map(this::processNumber).subscribeOn(Schedulers.parallel()))
.doOnNext(System.out::println)
.subscribe();
}
This test method has the output, I didn't expected at all and looks like that:
FluxSubscribeOn
FluxSubscribeOn
FluxSubscribeOn
FluxSubscribeOn
FluxSubscribeOn
There is a little helper method which looks like that:
private String processNumber(Integer x) {
String squaredValueAsString = String.valueOf(x * x);
return squaredValueAsString.concat(" - ").concat(Thread.currentThread().getName());
}
Nothing special here.
I am using Spring Boot 2.3.4 with Java 11 and the reactor implementation for Spring.
Do you have a good explaning example or do you know how to change the above tests so that they make sense?
Then please help me out with that.
Thanks a lot in advance!
Reactor which is the underlying library in Webflux consists of something called the event loop which in turn i believe is based on an architecture called the LMAX Architecture.
This means that the event loop is a single threaded event processer. Everything up to the event loop can be multithreaded but the events themselves are processed by a single thread. The event loop.
Regular spring boot applications are usually run using the server tomcat, or undertow, while webflux is per default run by the event driven server Netty, which in turn uses this event loop to process events for us.
So now that we understand what is underneath everything we can start talking about map and flatMap.
Map
If we look in the api we can see the following image:
and the api text says:
Transform the items emitted by this Flux by applying a synchronous function to each item.
Which is pretty self explanatory. We have a Flux of items, and each time map asks for an item to process it wont ask for another one until it has finished processing the first. Hence synchronous.
The image shows that, green circle needs to be converted to a green square, until we can ask for the the yellow circle to be converted to a yellow square... etc. etc.
here is a code example:
Flux.just("a", "b", "c")
.map(value -> value.toUppercase())
.subscribe(s -> System.out.println(s + " - " + Thread.currentThread().getName()));
// Output
A - main
B - main
C - main
Each are run on the main thread, and processed after each other synhronously.
flatMap
If we look in the api we can see the following image:
and the text says:
Transform the elements emitted by this Flux asynchronously into Publishers, then flatten these inner publishers into a single Flux through merging, which allow them to interleave.
it does this using basically three steps:
Generation of inners and subscription: this operator is eagerly subscribing to its inners.
Ordering of the flattened values: this operator does not necessarily preserve original ordering, as inner element are flattened as they arrive.
Interleaving: this operator lets values from different inners interleave (similar to merging the inner sequences).
So what does this mean? well it basically means that:
it will take each item in the flux, and transform it to individual Mono (publisher) with one item in each.
Order the items as they get processed, flatMap does NOT preserve order, as items can be processed in a different amount of time on the event loop.
Merging back all the processed items into a Flux for further processing down the line.
Here is a code example:
Flux.just("a", "b", "c")
.flatMap(value -> Mono.just(value.toUpperCase()))
.subscribe(s -> System.out.println(s + " - " + Thread.currentThread().getName()));
// Output
A - main
B - main
C - main
Wait flatMap printing the same thing as map!
Well, it all comes back to the threading model we talked about earlier. Actually there is only one thread called the event loop that handles all events.
Reactor is concurrent agnostic meaning that any worker can schedule jobs to be handled by the event loop.
So what is a worker well a worker is something a scheduler can spawn. And one important thing is that a worker doesn't have to be a thread, it can be, but it doesn't have to be.
In the above code cases, the main thread subscribes to our flux, which means that the main thread will process this for us and schedule work for the event loop to handle.
In a server environment this necessarily doesn't have to be the case. The important thing to understand here is that reactor can switch workers (aka possible threads) whenever it wants if it needs to.
In my above code examples there is only a main thread, so there is no need to run things on multiple threads, or have parallel execution.
If i wish to force it, i can use one of the different schedulers which all have their uses. In Netty, the server will start up will the same amount of event loop threads as cores on your machine, so there it can switch workers and cores freely if needed at heavy loads to maximize the usage of all event loops..
FlatMap being async does NOT mean parallel, it means that it will schedule all things to be processed by the event loop at the same time but its still only one thread executing the tasks.
Parallel execution
if i really want to execute something in parallel you can for instance place something on a parallel Scheduler. This means that it it will guarantee multiple workers on multiple cores. But remember there is a setup time for this when your program is run, and this is usually only beneficial if you have heavy computational stuff which in turn needs a lot of single core CPU power.
code example:
Flux.just("a", "b", "c")
.flatMap(value -> value -> Mono.just(value.toUpperCase()))
.subscribeOn(Schedulers.parallel())
.subscribe(s -> System.out.println(s + " - " + Thread.currentThread().getName()));
// Output
A - parallel-1
B - parallel-1
C - parallel-1
Here we are still running on just one thread, because subscribeOn means that when a thread subscribes the Scheduler will pick one thread from the scheduler pool and then stick with it throughout execution.
if we want to absolutely feel the need to force execution on multiple threads we can for instance use a parallel flux.
Flux.range(1, 10)
.parallel(2)
.runOn(Schedulers.parallel())
.subscribe(i -> System.out.println(Thread.currentThread().getName() + " -> " + i));
// Output
parallel-3 -> 2
parallel-2 -> 1
parallel-3 -> 4
parallel-2 -> 3
parallel-3 -> 6
parallel-2 -> 5
parallel-3 -> 8
parallel-2 -> 7
parallel-3 -> 10
parallel-2 -> 9
But remember this is in most cases not necessary. There is a setup time, and this type of execution is usually only beneficial if you have a lot of cpu heavy tasks. Otherwise using the default event loop single thread will in most cases "probably" be faster.
Dealing with a lot of i/o tasks, is usually more about orchestration, than raw CPU power.
Most of the information here is fetched from the Flux and Mono api.
the Reactor documentation is an amazing and interesting source of information.
also Simon Baslé's blog series Flight of the flux is also a wonderful and interesting read. It also exists in Youtube format
There is also some faults here and there and i have made some assumptions too especially when it comes to the inner workings of Reactor. But hopefully this will at least clear up some thoughts.
If someone feels things are direct faulty, feel free to edit.
I want to use an accumulator to gather some stats about the data I'm manipulating on a Spark job. Ideally, I would do that while the job computes the required transformations, but since Spark would re-compute tasks on different cases the accumulators would not reflect true metrics. Here is how the documentation describes this:
For accumulator updates performed inside actions only, Spark
guarantees that each task’s update to the accumulator will only be
applied once, i.e. restarted tasks will not update the value. In
transformations, users should be aware of that each task’s update may
be applied more than once if tasks or job stages are re-executed.
This is confusing since most actions do not allow running custom code (where accumulators can be used), they mostly take the results from previous transformations (lazily). The documentation also shows this:
val acc = sc.accumulator(0)
data.map(x => acc += x; f(x))
// Here, acc is still 0 because no actions have cause the `map` to be computed.
But if we add data.count() at the end, would this be guaranteed to be correct (have no duplicates) or not? Clearly acc is not used "inside actions only", as map is a transformation. So it should not be guaranteed.
On the other hand, discussion on related Jira tickets talk about "result tasks" rather than "actions". For instance here and here. This seems to indicate that the result would indeed be guaranteed to be correct, since we are using acc immediately before and action and thus should be computed as a single stage.
I'm guessing that this concept of a "result task" has to do with the type of operations involved, being the last one that includes an action, like in this example, which shows how several operations are divided into stages (in magenta, image taken from here):
So hypothetically, a count() action at the end of that chain would be part of the same final stage, and I would be guaranteed that accumulators used on the last map will no include any duplicates?
Clarification around this issue would be great! Thanks.
To answer the question "When are accumulators truly reliable ?"
Answer : When they are present in an Action operation.
As per the documentation in Action Task, even if any restarted tasks are present it will update Accumulator only once.
For accumulator updates performed inside actions only, Spark guarantees that each task’s update to the accumulator will only be applied once, i.e. restarted tasks will not update the value. In transformations, users should be aware of that each task’s update may be applied more than once if tasks or job stages are re-executed.
And Action do allow to run custom code.
For Ex.
val accNotEmpty = sc.accumulator(0)
ip.foreach(x=>{
if(x!=""){
accNotEmpty += 1
}
})
But, Why Map+Action viz. Result Task operations are not reliable for an Accumulator operation?
Task failed due to some exception in code. Spark will try 4 times(default number of tries).If task fail every time it will give an exception.If by chance it succeeds then Spark will continue and just update the accumulator value for successful state and failed states accumulator values are ignored.Verdict : Handled Properly
Stage Failure : If an executor node crashes, no fault of user but an hardware failure - And if the node goes down in shuffle stage.As shuffle output is stored locally, if a node goes down, that shuffle output is gone.So Spark goes back to the stage that generated the shuffle output, looks at which tasks need to be rerun, and executes them on one of the nodes that is still alive.After we regenerate the missing shuffle output, the stage which generated the map output has executed some of it’s tasks multiple times.Spark counts accumulator updates from all of them.Verdict : Not handled in Result Task.Accumulator will give wrong output.
If a task is running slow then, Spark can launch a speculative copy of that task on another node.Verdict : Not handled.Accumulator will give wrong output.
RDD which is cached is huge and can't reside in Memory.So whenever the RDD is used it will re run the Map operation to get the RDD and again accumulator will be updated by it.Verdict : Not handled.Accumulator will give wrong output.
So it may happen same function may run multiple time on same data.So Spark does not provide any guarantee for accumulator getting updated because of the Map operation.
So it is better to use Accumulator in Action operation in Spark.
To know more about Accumulator and its issues refer this Blog Post - By Imran Rashid.
Accumulator updates are sent back to the driver when a task is successfully completed. So your accumulator results are guaranteed to be correct when you are certain that each task will have been executed exactly once and each task did as you expected.
I prefer relying on reduce and aggregate instead of accumulators because it is fairly hard to enumerate all the ways tasks can be executed.
An action starts tasks.
If an action depends on an earlier stage and the results of that stage are not (fully) cached, then tasks from the earlier stage will be started.
Speculative execution starts duplicate tasks when a small number of slow tasks are detected.
That said, there are many simple cases where accumulators can be fully trusted.
val acc = sc.accumulator(0)
val rdd = sc.parallelize(1 to 10, 2)
val accumulating = rdd.map { x => acc += 1; x }
accumulating.count
assert(acc == 10)
Would this be guaranteed to be correct (have no duplicates)?
Yes, if speculative execution is disabled. The map and the count will be a single stage, so like you say, there is no way a task can be successfully executed more than once.
But an accumulator is updated as a side-effect. So you have to be very careful when thinking about how the code will be executed. Consider this instead of accumulating.count:
// Same setup as before.
accumulating.mapPartitions(p => Iterator(p.next)).collect
assert(acc == 2)
This will also create one task for each partition, and each task will be guaranteed to execute exactly once. But the code in map will not get executed on all elements, just the first one in each partition.
The accumulator is like a global variable. If you share a reference to the RDD that can increment the accumulator then other code (other threads) can cause it to increment too.
// Same setup as before.
val x = new X(accumulating) // We don't know what X does.
// It may trigger the calculation
// any number of times.
accumulating.count
assert(acc >= 10)
I think Matei answered this in the referred documentation:
As discussed on https://github.com/apache/spark/pull/2524 this is
pretty hard to provide good semantics for in the general case
(accumulator updates inside non-result stages), for the following
reasons:
An RDD may be computed as part of multiple stages. For
example, if you update an accumulator inside a MappedRDD and then
shuffle it, that might be one stage. But if you then call map() again
on the MappedRDD, and shuffle the result of that, you get a second
stage where that map is pipeline. Do you want to count this
accumulator update twice or not?
Entire stages may be resubmitted if
shuffle files are deleted by the periodic cleaner or are lost due to a
node failure, so anything that tracks RDDs would need to do so for
long periods of time (as long as the RDD is referenceable in the user
program), which would be pretty complicated to implement.
So I'm going
to mark this as "won't fix" for now, except for the part for result
stages done in SPARK-3628.
Consider the following Flux
Flux.range(1, 5)
.parallel(10)
.runOn(Schedulers.parallel())
.map(i -> "https://www.google.com")
.flatMap(uri -> Mono.fromCallable(new HttpGetTask(httpClient, uri)))
HttpGetTask is a Callable whose actual implementation is irrelevant in this case, it makes a HTTP GET call to the given URI and returns the content if successful.
Now, I'd like to slow down the emission by introducing an artificial delay, such that up to 10 threads are started simultaneously, but each one doesn't complete as soon as HttpGetTask is done. For example, say no thread must finish before 3 seconds. How do I achieve that?
If the requirement is really "not less than 3s" you could add a delay of 3 seconds to the Mono inside the flatMap by using Mono.fromCallable(...).delayElement(Duration.ofSeconds(3)).
What are all the similarities and diferences between them, It looks like Java Parallel Stream has some of the element available in RXJava, is that right?
Rx is an API for creating and processing observable sequences. The Streams API is for processing iterable sequences. Rx sequences are push-based; you are notified when an element is available. A Stream is pull-based; it "asks" for items to process. They may appear similar because they both support similar operators/transforms, but the mechanics are essentially opposites of each other.
Stream is pull based. Personally I feel it is Oracle's answer to C# IEnumerable<>, LINQ and their related extension methods.
RxJava is push based, which I am not sure whether it is .NET's reactive extensions released first or Rx project goes live first.
Conceptually they are totally different and their applications are also different.
If you are implementing a text searching program on a text file that's so large that you can't load everything and fit into memory, you would probably want to use Stream since you can easily determine if you have next lines available by keeping track of your iterator, and scan line by line.
Another application of Stream would be parallel calculations on a collection of data. Nowadays every machine has multiple cores but you won't know easily exactly how many cores your client machine are available. It would be hard to pre-configure the number of threads to operate. So we use parallel stream and let the JVM to determine that for us (supposed to be more optimal).
On the other hand, if you are implementing a program that takes an user input string and searches for available videos on the web, you would use RX since you won't even know when the program will start getting any results (or receive an error of network timeout). To make your program responsive you have to let the program "subscribe" for network updates and complete signals.
Another common application of Rx is on GUI to "detect user finished input" without requiring the user to click a button to confirm. For example you want to have a text field whenever the user stops typing you start searching without waiting a "Search button" click. In this case you use Rx to create an observable on "KeyEvent" and "throttle" (e.g. at 500ms), so that whenever he stopped typing for 500ms you receive an onNext() to "start searching".
There is also a difference in threading.
Stream#parallel splits the sequence into parts, and each part is processed in the separate thread.
Observable#subscribeOn and Observable#observeOn are both 'move' execution to another thread, but don't split the sequence.
In other words, for any particular processing stage:
parallel Stream may process different elements on different threads
Observable will use one thread for the stage
E. g. we have Observable/Stream of many elements and two processing stages:
Observable.create(...)
.observeOn(Schedulers.io())
.map(x -> stage1(x))
.observeOn(Schedulers.io())
.map(y -> stage2(y))
.forEach(...);
Stream.generate(...)
.parallel()
.map(x -> stage1(x))
.map(y -> stage2(y))
.forEach(...);
Observable will use no more than 2 additional threads (one per stage), so no two x'es or y's are accessed by different threads. Stream, on the countrary, may span each stage across several threads.