Java 8 lambda api - java

I'm working to migrate from Rx Java to Java 8 lambdas. One example I can't find is a way to buffer requests. For example, in Rx Java, I can say the following.
Observable.create(getIterator()).buffer(20, 1000, TimeUnit. MILLISECONDS).doOnNext(list -> doWrite(list));
Where we buffer 20 elements into a list, or timeout at 1000 milliseconds, which ever happens first.
Observables in RX are a "push" style observable, where as Streams use a java pull. Would this be possible implementing my own map operation in streams, or does the inability to emit cause problems with this since the doOnNext has to poll the previous element?

One way to do it would be to use a BlockingQueue and Guava. Using Queues.drain, you can create a Collection that you could then call stream() on and do your transformations. Here's a link: Guava Queues.drain
And here's a quick example:
public void transform(BlockingQueue<Something> input)
{
List<Something> buffer = new ArrayList<>(20);
Queues.drain(input, buffer, 20, 1000, TimeUnit.MILLISECONDS);
doWrite(buffer);
}

simple-react has similar operators, but not this exact one. It's pretty extensible though, so it should be possible to write your own. With the caveat that I haven't written this in an IDE or tested it, roughly a buffer by size with timeout operator for simple-react would look something like this
import com.aol.simple.react.async.Queue;
import com.aol.simple.react.stream.traits.LazyFutureStream;
import com.aol.simple.react.async.Queue.ClosedQueueException;
import com.aol.simple.react.util.SimpleTimer;
import java.util.concurrent.TimeUnit;
static LazyFutureStream batchBySizeAndTime(LazyFutureStream stream,int size,long time, TimeUnit unit) {
Queue queue = stream.toQueue();
Function<Supplier<U>, Supplier<Collection<U>>> fn = s -> {
return () -> {
SimpleTimer timer = new SimpleTimer();
List<U> list = new ArrayList<>();
try {
do {
if(list.size()==size())
return list;
list.add(s.get());
} while (timer.getElapsedNanoseconds()<unit.toNanos(time));
} catch (ClosedQueueException e) {
throw new ClosedQueueException(list);
}
return list;
};
};
return stream.fromStream(queue.streamBatch(stream.getSubscription(), fn));
}

Related

Processing changing source data in Java Akka streams

2 threads are started. dataListUpdateThread adds the number 2 to a List. processFlowThread sums the values in the same List and prints the summed list to the console. Here is the code:
import akka.NotUsed;
import akka.actor.ActorSystem;
import akka.stream.javadsl.Sink;
import akka.stream.javadsl.Source;
import java.util.ArrayList;
import java.util.List;
import java.util.concurrent.CompletableFuture;
import java.util.concurrent.CompletionStage;
import java.util.concurrent.ExecutionException;
import static java.lang.Thread.sleep;
public class SourceExample {
private final static ActorSystem system = ActorSystem.create("SourceExample");
private static void delayOneSecond() {
try {
sleep(1000);
} catch (InterruptedException e) {
e.printStackTrace();
}
}
private static void printValue(CompletableFuture<Integer> integerCompletableFuture) {
try {
System.out.println("Sum is " + integerCompletableFuture.get().intValue());
} catch (ExecutionException | InterruptedException e) {
e.printStackTrace();
}
}
public static void main(String[] args) {
final List dataList = new ArrayList<Integer>();
final Thread dataListUpdateThread = new Thread(() -> {
while (true) {
dataList.add(2);
System.out.println(dataList);
delayOneSecond();
}
});
dataListUpdateThread.start();
final Thread processFlowThread = new Thread(() -> {
while (true) {
final Source<Integer, NotUsed> source = Source.from(dataList);
final Sink<Integer, CompletionStage<Integer>> sink =
Sink.fold(0, (agg, next) -> agg + next);
final CompletionStage<Integer> sum = source.runWith(sink, system);
printValue(sum.toCompletableFuture());
delayOneSecond();
}
});
processFlowThread.start();
}
}
I've tried to create the simplest example to frame the question. dataListUpdateThread could be populating the List from a REST service or Kafka topic instead of just adding the value 2 to the List. Instead of using Java threads how should this scenario be implemented? In other words, how to share dataList to the Akka Stream for processing?
Mutating the collection passed to Source.from is only ever going to accomplish this by coincidence: if the collection is ever exhausted, Source.from will complete the stream. This is because it's intended for finite, strictly evaluated data (the use cases are basically: a) simple examples for the docs and b) situations where you want to bound resource consumption when performing an operation for a collection in the background (think a list of URLs that you want to send HTTP requests to)).
NB: I haven't written Java to any great extent since the Java 7 days, so I'm not providing Java code, just an outline of approaches.
As mentioned in a prior answer Source.queue is probably the best option (besides using something like Akka HTTP or an Alpakka connector). In a case such as this, where the stream's materialized value is a future that won't be completed until the stream completes, that Source.queue will never complete the stream (because there's no way for it to know that its reference is the only reference), introducing a KillSwitch and propagating that through viaMat and toMat would give you the ability to decide outside of the stream to complete the stream.
An alternative to Source.queue, is Source.actorRef, which lets you send a distinguished message (akka.Done.done() in the Java API is pretty common for this). That source materializes as an ActorRef to which you can tell messages, and those messages (at least those which match the type of the stream) will be available for the stream to consume.
With both Source.queue and Source.actorRef, it's often useful to prematerialize them: the alternative in a situation like your example where you also want the materialized value of the sink, is to make heavy use of the Mat operators to customize materialized values (in Scala, it's possible to use tuples to at least simplify combining multiple materialized values, but in Java, once you got beyond a pair (as you would with queue), I'm pretty sure you'd have to define a class just to hold the three (queue, killswitch, future for completed value) materialized values).
It's also worth noting that, since Akka Streams run on actors in the background (and thus get scheduled as needed onto the ActorSystem's threads), there's almost never a reason to create a thread on which to run a stream.

Why is `parallelStream` faster than the `CompletableFuture` implementation?

I wanted to increase the performance of my backend REST API on a certain operation that polled multiple different external APIs sequentially and collected their responses and flattened them all into a single list of responses.
Having just recently learned about CompletableFutures, I decided to give it a go, and compare that solution with the one that involved simply changing my stream for a parallelStream.
Here is the code used for the benchmark-test:
package com.foo;
import java.util.Arrays;
import java.util.List;
import java.util.Objects;
import java.util.concurrent.CompletableFuture;
import java.util.concurrent.TimeUnit;
import java.util.stream.Collectors;
import org.junit.jupiter.api.AfterEach;
import org.junit.jupiter.api.BeforeEach;
import org.junit.jupiter.api.Test;
public class ConcurrentTest {
static final List<String> REST_APIS =
Arrays.asList("api1", "api2", "api3", "api4", "api5", "api6", "api7", "api8");
MyTestUtil myTest = new MyTestUtil();
long millisBefore; // used to benchmark
#BeforeEach
void setUp() {
millisBefore = System.currentTimeMillis();
}
#AfterEach
void tearDown() {
System.out.printf("time taken : %.4fs\n",
(System.currentTimeMillis() - millisBefore) / 1000d);
}
#Test
void parallelSolution() { // 4s
var parallel = REST_APIS.parallelStream()
.map(api -> myTest.collectOneRestCall())
.flatMap(List::stream)
.collect(Collectors.toList());
System.out.println("List of responses: " + parallel.toString());
}
#Test
void futureSolution() throws Exception { // 8s
var futures = myTest.collectAllResponsesAsync(REST_APIS);
System.out.println("List of responses: " + futures.get()); // only blocks here
}
#Test
void originalProblem() { // 32s
var sequential = REST_APIS.stream()
.map(api -> myTest.collectOneRestCall())
.flatMap(List::stream)
.collect(Collectors.toList());
System.out.println("List of responses: " + sequential.toString());
}
}
class MyTestUtil {
public static final List<String> RESULTS = Arrays.asList("1", "2", "3", "4");
List<String> collectOneRestCall() {
try {
TimeUnit.SECONDS.sleep(4); // simulating the await of the response
} catch (Exception io) {
throw new RuntimeException(io);
} finally {
return MyTestUtil.RESULTS; // always return something, for this demonstration
}
}
CompletableFuture<List<String>> collectAllResponsesAsync(List<String> restApiUrlList) {
/* Collecting the list of all the async requests that build a List<String>. */
List<CompletableFuture<List<String>>> completableFutures = restApiUrlList.stream()
.map(api -> nonBlockingRestCall())
.collect(Collectors.toList());
/* Creating a single Future that contains all the Futures we just created ("flatmap"). */
CompletableFuture<Void> allFutures = CompletableFuture.allOf(completableFutures
.toArray(new CompletableFuture[restApiUrlList.size()]));
/* When all the Futures have completed, we join them to create merged List<String>. */
CompletableFuture<List<String>> allCompletableFutures = allFutures
.thenApply(future -> completableFutures.stream()
.filter(Objects::nonNull) // we filter out the failed calls
.map(CompletableFuture::join)
.flatMap(List::stream) // creating a List<String> from List<List<String>>
.collect(Collectors.toList())
);
return allCompletableFutures;
}
private CompletableFuture<List<String>> nonBlockingRestCall() {
/* Manage the Exceptions here to ensure the wrapping Future returns the other calls. */
return CompletableFuture.supplyAsync(() -> collectOneRestCall())
.exceptionally(ex -> {
return null; // gets managed in the wrapping Future
});
}
}
There is a list of 8 (fake) APIs. Each response takes 4 seconds to execute and returns a list of 4 entities (Strings, in our case, for the sake of simplicity).
The results:
stream : 32 seconds
parallelStream : 4 seconds
CompletableFuture : 8 seconds
I'm quite surprised and expected the last two to be almost identical. What exactly is causing that difference? As far as I know, they are both using the ForkJoinPool.commonPool().
My naive interpretation would be that parallelStream, since it is a blocking operation, uses the actual MainThread for its workload and thus has an extra active thread to work with, compared to the CompletableFuture which is asynchronous and thus cannot use that MainThread.
CompletableFuture.supplyAsync() will end up using a ForkJoinPool initialized with parralelism of Runtime.getRuntime().availableProcessors() - 1 (JDK 11 source)
So looks like you have an 8 processor machine. Therefore there are 7 threads in the pool.
There are 8 API calls, so only 7 can run at a time on the common pool. And for the completable futures test, there will be 8 tasks running with your main thread blocking until they all complete. 7 will be able to execute at once meaning one has to wait for 4 seconds.
parallelStream() also uses this same thread pool, however the difference is that the first task will be executed on main thread that is executing the stream's terminal operation, leaving 7 to be distributed to the common pool. So there are just enough threads to run everything in parallel in this scenario. Try increasing the number of tasks to 9 and you will get the 8 second run-time for your test.

Blocking in java Streams

Is there a way to make a Stream block and wait for data to be ready in the data source and then I close it when I know there is nothing else to wait for? I already tried to make the data source a blocking one like BlockingQueue but it didn't work obviously as I am looping on the stream using forEeach not using take or peek functions that block.
Streams are designed around Spliterator as the ultimate source of their elements. You could implement the tryAdvance() method to test whether another element exists, blocking until the result is known.
You mentioned a BlockingQueue, which is useful in concurrent processing. If you are "producing" elements in some threads, and trying to "consume" them in others, you might find that a CompletionService fits your application better than a custom Stream.
Spliterator is a fairly simple interface in terms of its operations, but implementing it correctly requires a good understanding of spliterator "characteristics". I would consider it an advanced topic, and while there are cases where a custom implementation is useful, it might also be a warning sign that you are looking at the wrong approach—you don't have to use Stream for everything.
(Create a Stream from a Spliterator with StreamSupport.stream().)
I guess this should work:
BlockingQueue<T> queue = ...; // Signal with a null sentinel.
Iterable<T> collection = () -> new Iterator<T>() {
private boolean hasCurrent;
private T current;
public boolean hasNext() {
if (!hasCurrent) {
current = queue.take();
hasCurrent = true;
}
return current != null;
}
public T next() {
if (hasCurrent) {
hasCurrent = false;
return current;
} else {
return queue.take();
}
}
};
StreamSupport.stream(collection.spliterator(), false)...
There may well be better ways.

Generate infinite sequence of Natural numbers using RxJava

I am trying to write a simple program using RxJava to generate an infinite sequence of natural numbers. So, far I have found two ways to generate sequence of numbers using Observable.timer() and Observable.interval(). I am not sure if these functions are the right way to approach this problem. I was expecting a simple function like one we have in Java 8 to generate infinite natural numbers.
IntStream.iterate(1, value -> value +1).forEach(System.out::println);
I tried using IntStream with Observable but that does not work correctly. It sends infinite stream of numbers only to first subscriber. How can I correctly generate infinite natural number sequence?
import rx.Observable;
import rx.functions.Action1;
import java.util.stream.IntStream;
public class NaturalNumbers {
public static void main(String[] args) {
Observable<Integer> naturalNumbers = Observable.<Integer>create(subscriber -> {
IntStream stream = IntStream.iterate(1, val -> val + 1);
stream.forEach(naturalNumber -> subscriber.onNext(naturalNumber));
});
Action1<Integer> first = naturalNumber -> System.out.println("First got " + naturalNumber);
Action1<Integer> second = naturalNumber -> System.out.println("Second got " + naturalNumber);
Action1<Integer> third = naturalNumber -> System.out.println("Third got " + naturalNumber);
naturalNumbers.subscribe(first);
naturalNumbers.subscribe(second);
naturalNumbers.subscribe(third);
}
}
The problem is that the on naturalNumbers.subscribe(first);, the OnSubscribe you implemented is being called and you are doing a forEach over an infinite stream, hence why your program never terminates.
One way you could deal with it is to asynchronously subscribe them on a different thread. To easily see the results I had to introduce a sleep into the Stream processing:
Observable<Integer> naturalNumbers = Observable.<Integer>create(subscriber -> {
IntStream stream = IntStream.iterate(1, i -> i + 1);
stream.peek(i -> {
try {
// Added to visibly see printing
Thread.sleep(50);
} catch (InterruptedException e) {
}
}).forEach(subscriber::onNext);
});
final Subscription subscribe1 = naturalNumbers
.subscribeOn(Schedulers.newThread())
.subscribe(first);
final Subscription subscribe2 = naturalNumbers
.subscribeOn(Schedulers.newThread())
.subscribe(second);
final Subscription subscribe3 = naturalNumbers
.subscribeOn(Schedulers.newThread())
.subscribe(third);
Thread.sleep(1000);
System.out.println("Unsubscribing");
subscribe1.unsubscribe();
subscribe2.unsubscribe();
subscribe3.unsubscribe();
Thread.sleep(1000);
System.out.println("Stopping");
Observable.Generate is exactly the operator to solve this class of problem reactively. I also assume this is a pedagogical example, since using an iterable for this is probably better anyway.
Your code produces the whole stream on the subscriber's thread. Since it is an infinite stream the subscribe call will never complete. Aside from that obvious problem, unsubscribing is also going to be problematic since you aren't checking for it in your loop.
You want to use a scheduler to solve this problem - certainly do not use subscribeOn since that would burden all observers. Schedule the delivery of each number to onNext - and as a last step in each scheduled action, schedule the next one.
Essentially this is what Observable.generate gives you - each iteration is scheduled on the provided scheduler (which defaults to one that introduces concurrency if you don't specify it). Scheduler operations can be cancelled and avoid thread starvation.
Rx.NET solves it like this (actually there is an async/await model that's better, but not available in Java afaik):
static IObservable<int> Range(int start, int count, IScheduler scheduler)
{
return Observable.Create<int>(observer =>
{
return scheduler.Schedule(0, (i, self) =>
{
if (i < count)
{
Console.WriteLine("Iteration {0}", i);
observer.OnNext(start + i);
self(i + 1);
}
else
{
observer.OnCompleted();
}
});
});
}
Two things to note here:
The call to Schedule returns a subscription handle that is passed back to the observer
The Schedule is recursive - the self parameter is a reference to the scheduler used to call the next iteration. This allows for unsubscription to cancel the operation.
Not sure how this looks in RxJava, but the idea should be the same. Again, Observable.generate will probably be simpler for you as it was designed to take care of this scenario.
When creating infinite sequencies care should be taken to:
subscribe and observe on different threads; otherwise you will only serve single subscriber
stop generating values as soon as subscription terminates; otherwise runaway loops will eat your CPU
The first issue is solved by using subscribeOn(), observeOn() and various schedulers.
The second issue is best solved by using library provided methods Observable.generate() or Observable.fromIterable(). They do proper checking.
Check this:
Observable<Integer> naturalNumbers =
Observable.<Integer, Integer>generate(() -> 1, (s, g) -> {
logger.info("generating {}", s);
g.onNext(s);
return s + 1;
}).subscribeOn(Schedulers.newThread());
Disposable sub1 = naturalNumbers
.subscribe(v -> logger.info("1 got {}", v));
Disposable sub2 = naturalNumbers
.subscribe(v -> logger.info("2 got {}", v));
Disposable sub3 = naturalNumbers
.subscribe(v -> logger.info("3 got {}", v));
Thread.sleep(100);
logger.info("unsubscribing...");
sub1.dispose();
sub2.dispose();
sub3.dispose();
Thread.sleep(1000);
logger.info("done");

What is the best / most elegant way to limit the number of concurrent evaluation (like with a fixedThreadPool) in parallel streams

Assume a lambda expression consume a certain amount of a resource (like memory) which is limited and requires to limit the number of concurrent executions (example: if the lambda temporarily consumes 100 MB (of local memory) and we like to limit it to 1GB, we do not allow for more that 10 concurrent evaluations).
What is the best way to limit the number of concurrent execution, say for example in
IntStream.range(0, numberOfJobs).parallel().foreach( i -> { /*...*/ });
?
Note: An obvious option is to perform a nesting like
double jobsPerThread = (double)numberOfJobs / numberOfThreads;
IntStream.range(0, numberOfThreads).parallel().forEach( threadIndex ->
IntStream.range((int)(threadIndex * jobsPerThread), (int)((threadIndex+1) * jobsPerThread)).sequential().forEach( i -> { /*...*/ }));
Is this the only way? Tt is not that elegant. Actually I would like to have a
IntStream.range(0, numberOfJobs).parallel(numberOfThreads).foreach( i -> { /*...*/ });
The Streams use a ForkJoinPool for parallel operations. By default they are using the ForkJoinPool.commonPool() which does not allow changing the concurrency afterwards. However, you can use your own ForkJoinPool instance. When you execute the stream code within the context of your own ForkJoinPool this context pool will be used for the stream operations. The following example illustrates this by executing the same operation once using default behavior and once using a custom pool with a fixed concurrency of 2:
import java.util.HashSet;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.ForkJoinPool;
import java.util.stream.IntStream;
public class InterfaceStaticMethod {
public static void main(String[] arg) throws Exception {
Runnable parallelCode=() -> {
HashSet<String> allThreads=new HashSet<>();
IntStream.range(0, 1_000_000).parallel().filter(i->{
allThreads.add(Thread.currentThread().getName()); return false;}
).min();
System.out.println("executed by "+allThreads);
};
System.out.println("default behavior: ");
parallelCode.run();
System.out.println("specialized pool:");
ForkJoinPool pool=new ForkJoinPool(2);
pool.submit(parallelCode).get();
}
}
Depending on your use case, using the CompletableFuture utility methods may be easier:
import static java.util.concurrent.CompletableFuture.runAsync;
ExecutorService executor = Executors.newFixedThreadPool(10); //max 10 threads
for (int i = 0; i < numberOfJobs; i++) {
runAsync(() -> /* do something with i */, executor);
}
//or with a stream:
IntStream.range(0, numberOfJobs)
.forEach(i -> runAsync(() -> /* do something with i */, executor));
The main difference with your code is that the parallel forEach will only return after the last job is over, whereas runAsync will return as soon as all the jobs have been submitted. There are various ways to change that behaviour if required.

Categories

Resources