Mix explicit and implicit parallelism with java-8 streams - java

in the past I have written some java programs, using two threads.
First thread (producer) was reading data from an API (C library), create a java object, send the object to the other thread.
The C API is delivering an event stream (infinite).
The threads are using a LinkedBlockingQueue as a pipeline to exchange the objects (put, poll).
The second thread (consumer) is dealing with the object.
(I also found that code is more readable within the threads. First thread is dealing with the C API stuff and producing
proper java objects, second thread is free from C API handling and is dealing with the data).
Now I'm interested, how I can realize this scenario above with the new stream API coming in java 8.
But assuming I want to keep the two threads (producer/consumer)!
First thread is writing into the stream. Second thread is reading from the stream.
I also hope, that I can handle with this technique a better explicit parallelism (producer/consumer)
and within the stream I can use some implicit parallelism (e.g. stream.parallel()).
I don't have many experience with the new stream api.
So I experimented with the following code below, to solve the idea above.
I use 'generate' to access the C API and feed this to the java stream.
I used in the consumer thread .parallel() to test and handle implicit parallelism. Looks fine. But see below.
Questions:
Is 'generate' the best way in this scenario for the producer?
I have an understanding problem how to terminate/close the stream in the producer,
if the API has some errors AND I want to shutdown the whole pipeline.
Do I use stream.close or throw an exception?
2.1 I used stream.close(). But 'generate' is still running after closing,
I found only to throw an exception to terminate the generate part.
This exception is going into the stream and consumer is receiving the exception
(This is fine for me, consumer can recognize it and terminate).
But in this case, the producer has produced more then consumer has processed, while exception is arriving.
2.2 if consumer is using implicit parallelism stream.parallel(). The producer is processing much more items.
So I don't see any solution for this problem. (Accessing C API, check error, make decision).
2.3 Throwing the exception in producer arrives at consumer stream, but not all inserted objects are processed.
Once more: the idea is to have an explicit parallelism with the threads.
But internally I can deal with the new features and use parallel processing when possible
Thanks for breeding about this problem too.
package sandbox.test;
import java.util.concurrent.atomic.AtomicInteger;
import java.util.stream.LongStream;
public class MyStream {
private volatile LongStream stream = null;
private AtomicInteger producerCount = new AtomicInteger(0);
private AtomicInteger consumerCount = new AtomicInteger(0);
private AtomicInteger apiError = new AtomicInteger(0);
public static void main(String[] args) throws InterruptedException {
MyStream appl = new MyStream();
appl.create();
}
private static void sleep(long sleep) {
try {
Thread.sleep(sleep);
} catch (InterruptedException e) {
throw new RuntimeException(e);
}
}
private static void apiError(final String pos, final int iteration) {
RuntimeException apiException = new RuntimeException("API error pos=" + pos + " iteration=" + iteration);
System.out.println(apiException.getMessage());
throw apiException;
}
final private int simulateErrorAfter = 10;
private Thread produce() {
Thread thread = new Thread(new Runnable() {
#Override
public void run() {
System.out.println("Producer started");
stream = LongStream.generate(() -> {
int localCount;
// Detect error, while using stream.parallel() processing
int error = apiError.get();
if ( error > 0 )
apiError("1", error);
// ----- Accessing the C API here -----
localCount = producerCount.incrementAndGet(); // C API access; delegate for accessing the C API
// ----- Accessing the C API here -----
// Checking error code from C API
if ( localCount > simulateErrorAfter ) { // Simulate an API error
producerCount.decrementAndGet();
stream.close();
apiError("2", apiError.incrementAndGet());
}
System.out.println("P: " + localCount);
sleep(200L);
return localCount;
});
System.out.println("Producer terminated");
}
});
thread.start();
return thread;
}
private Thread consume() {
Thread thread = new Thread(new Runnable() {
#Override
public void run() {
try {
stream.onClose(new Runnable() {
#Override
public void run() {
System.out.println("Close detected");
}
}).parallel().forEach(l -> {
sleep(1000);
System.out.println("C: " + l);
consumerCount.incrementAndGet();
});
} catch (Exception e) {
// Capturing the stream end
System.out.println(e);
}
System.out.println("Consumer terminated");
}
});
thread.start();
return thread;
}
private void create() throws InterruptedException {
Thread producer = produce();
while ( stream == null )
sleep(10);
Thread consumer = consume();
producer.join();
consumer.join();
System.out.println("Produced: " + producerCount);
System.out.println("Consumed: " + consumerCount);
}
}

You need to understand some fundamental points about the Stream API:
All operations applied on a stream are lazy and won’t do anything before the terminal operation will be applied. There is no sense in creating the stream using a “producer” thread as this thread won’t do anything. All actions are performed within your “consumer” thread and the background threads started by the Stream implementation itself. The thread that created the Stream instance is completely irrelevant
Closing a stream has no relevance for the Stream operation itself, i.e. does not shut down threads. It is meant to release additional resources, e.g. closing the file associated with the stream returned by Files.lines(…). You can schedule such cleanup actions using onClose and the Stream will invoke them when you call close but that’s it. For the Stream class itself it has no meaning.
Streams do not model a scenario like “one thread is writing and another one is reading”. Their model is “one thread is calling your Supplier, followed by calling your Consumer and another thread does the same, and x other threads too…”
If you want to implement a producer/consumer scheme with distinct producer and consumer threads, you are better off using Threads or an ExecutorService and a thread-safe queue.
But you still can use Java 8 features. E.g. there is no need to implement Runnables using inner classes; you can use lambda expression for them.

Related

Reading from blocking queue with multiple threads

I have a producer-consumer model using a blocking queue where 4 threads read files from a directory puts it to the blocking queue and 4 threads(consumer) reads from blocking queue.
My problem is every time only one consumer reads from the Blockingqueue and the other 3 consumer threads are not reading:
final BlockingQueue<byte[]> queue = new LinkedBlockingQueue<>(QUEUE_SIZE);
CompletableFuture<Void> completableFutureProducer = produceUrls(files, queue, checker);
//not providing code for produceData , it is working file with all 4 //threads writing to Blocking queue. Here is the consumer code.
private CompletableFuture<Validator> consumeData(
final Response checker,
final CompletableFuture<Void> urls
) {
return CompletableFuture.supplyAsync(checker, 4)
.whenComplete((result, err) -> {
if (err != null) {
LOG.error("consuming url worker failed!", err);
urls.cancel(true);
}
});
}
completableFutureProducer.join();
completableFutureConsumer.join();
This is my code. Can someone tell me what I am doing wrong? Or help with correct code.
Why is one consumer reading from the Blocking queue.
Adding code for Response class reading from Blocking queue :
#Slf4j
public final class Response implements Supplier<Check> {
private final BlockingQueue<byte[]> data;
private final AtomicBoolean producersComplete;
private final Calendar calendar = Calendar.getInstance();
public ResponseCode(
final BlockingQueue<byte[]> data
) {
this.data = data;
producersDone = new AtomicBoolean();
}
public void notifyProducersDone() {
producersComplete.set(true);
}
#Override
public Check get() {
try {
Check check = null;
try {
while (!data.isEmpty() || !producersDone.get()) {
final byte[] item = data.poll(1, TimeUnit.SECONDS);
if (item != null) {
LOG.info("{}",new String(item));
// I see only one thread printing result here .
validator = validateData(item);
}
}
} catch (InterruptedException | IOException e) {
Thread.currentThread().interrupt();
throw new WriteException("Exception occurred while data validation", e);
}
return check;
} finally {
LOG.info("Done reading data from BlockingQueue");
}
}
}
It's hard to diagnose from this alone, but it's probably not correct to check for data.isEmpty() because the queue may happen to be temporarily empty (but later get items). So your threads might exit as soon as they encounter a temporarily empty queue.
Instead, you can exit if producers were done AND you got an empty result from the poll. That way the threads only exit when there are truly no more items to process.
It's a bit odd though that you are returning the result of the last item (alone). Are you sure this is what you want?
EDIT: I've done something very similar recently. Here is a class that reads from a file, transforms the lines in a multi-threaded way, then writes to a different file (the order of lines are preserved).
It also uses a BlockingQueue. It's very similar to your code, but it doesn't check for quue.isEmpty() for the aforementioned reason. It works fine for me.
4+4 threads is not that many, so you better do not use asynchronous tools like CompletableFuture. Simple multithreaded program would be simpler and work faster.
Having
BlockingQueue<byte[]> data;
don't use data.poll();
use data.take();
When you have lets say 1 item in the queue, and 4 consumers, one of them will poll the item rendering queue to be empty. Then 3 of the rest of the consumers checks if queue.isEmpty(), and since it is - quits the loop.

Exception propagation in Java parallel streams

In Akka in Action book it says that
Exceptions are
almost impossible to share between threads out of the box, unless you are prepared
to build a lot of infrastructure to handle this.
and, as far as I understand, if an exception occurs in a parallel thread it will be propagated to the caller. If this mechanism is possible, why isn't it implemented with regular threads? Am I missing something?
Edit:
I am talking about possibility of something like this:
public static void count() {
long count = 0;
try {
count = IntStream.range(1, 10)
.parallel()
.filter(number -> f(number)).count();
} catch(RuntimeException e) {
/* handle */
}
System.out.println("Count - " + count);
}
public static boolean f(final int number) {
if(Math.random() < 0.1) {
throw new RuntimeException();
}
return true;
}
parallel() spawns multiple threads and when a RuntimeException is thrown on any of them, that exception is still caught on main thread, which seems to counter that books point.
Edit 2:
The main difference is that while the individual Stream intermediates can run in parallel, they are only evaluated when the terminal operation is encountered; that makes it a virtual join point.
Ie, the same would be possible with something like
try {
Thread concurrent = new Thread(runnable);
concurrent.start();
concurrent.join();
} catch (ExceptionThrownInThread ex) {}
However, in the general case - and that's pretty much Akka's programming model - you have
yourMessenger.registerCallbacks(callbacks);
new Thread(yourMessenger).start();
Now, the callbacks will eventually be called from within the thread you created, but there is no structure to wrap around its execution as a whole; so who would catch this exception?
I don't know Akka enough, but in projectreactor's Publishers, you can register an error handler, as in
Mono<Result> mono = somethread.createResult().onError(errorHandler);
But again, in the general case it's not trivial.

.notify() isn't notifying .wait() for a Thread

I've been having trouble trying to get a waited thread to by notified.
Here is the code for where the notify() is called:
public static void main(String[] args)
{
int endUsers = 0;
Terminal terminal = new Terminal("Master");
ArrayList<Thread> threads = new ArrayList<Thread>();
threads.add(new Thread(
new EndUser("EndUser 1", DEFAULT_DST_NODE, 50000, 50001),
"EndUser 1"));
endUsers++;
threads.add(new Thread(
new EndUser("EndUser 2", DEFAULT_DST_NODE, 50001, 50000),
"EndUser 2"));
endUsers++;
for (Thread t : threads)
{
t.start();
}
while (true)
{
int user = terminal.readInt("Which user is sending data? ");
if (user <= endUsers && user > 0)
{
synchronized (threads.get(user - 1))
{
threads.get(user - 1).notify();
}
}
}
}
}
And here is the code where wait() is called:
public void run()
{
while (true)
{
try
{
synchronized (this)
{
this.wait();
}
this.send();
}
catch (Exception e)
{
}
}
}
I've tried everything I can think of but I have no idea why it isn't working.
threads.get(user - 1).notify();
is invoking notify on the Thread object, where
this.wait();
is waiting on your runnable, or the class in which the call is housed.
Using
Thread.currentThread().wait();
should fix your issue.
In addition, I would like to mention that creating an Object reference, then waiting on and notifying that would be a fully functional method to obtain what you want as well
You would create the Object as a(n) (optionally static) reference in your thread class
public final (static) Object waitObject = new Object();
Edit: ^ Making this final prevents other (likely malicious) code parts
from reassigning the value, which would make it so that the waitObject.notify()
method is never actually able to be reached.
Then use
waitObject.wait(); //or
waitObject.wait(time);
And
waitObject.notify(); //or
waitObject.notifyAll();
Edit:
As was pointed out by #shmosel, it is inherently unsafe to use "wait", "sleep", or "notify" from within a thread, as outlined within the Java Documentation. However, despite this, the functionality is still available for use, though discouraged.
For additional Java references you have multiple resources available to you; such as.:
Java API Overview, Java Thread API, This Google Search - Safe Java Practices (to which you can append what you are looking into, such as "Threading, Wait" for a search query of "Safe Java Practices Threading, Wait"), and many more.

How to wait for all threads to complete

I created some workflow how to wait for all thread which I created. This example works in 99 % of cases but sometimes method waitForAllDone is finished sooner then all thread are completed. I know it because after waitForAllDone I am closing stream which is using created thread so then occurs exception
Caused by: java.io.IOException: Stream closed
my thread start with:
#Override
public void run() {
try {
process();
} finally {
Factory.close(this);
}
}
closing:
protected static void close(final Client client) {
clientCount--;
}
when I creating thread I call this:
public RobWSClient getClient() {
clientCount++;
return new Client();
}
and clientCount variable inside factory:
private static volatile int clientCount = 0;
wait:
public void waitForAllDone() {
try {
while (clientCount > 0) {
Thread.sleep(10);
}
} catch (InterruptedException e) {
LOG.error("Error", e);
}
}
You need to protect the modification and reading of clientCount via synchronized. The main issue is that clientCount-- and clientCount++ are NOT an atomic operation and therefore two threads could execute clientCount-- / clientCount++ and end up with the wrong result.
Simply using volatile as you do above would ONLY work if ALL operations on the field were atomic. Since they are not, you need to use some locking mechanism. As Anton states, AtomicInteger is an excellent choice here. Note that it should be either final or volatile to ensure it is not thread-local.
That being said, the general rule post Java 1.5 is to use a ExecutorService instead of Threads. Using this in conjuction with Guava's Futures class could make waiting for all to complete to be as simple as:
Future<List<?>> future = Futures.successfulAsList(myFutureList);
future.get();
// all processes are complete
Futures.successfulAsList
I'm not sure that the rest of your your code has no issues, but you can't increment volatile variable like this - clientCount++; Use AtomicInteger instead
The best way to wait for threads to terminate, is to use one of the high-level concurrency facilities.
In this case, the easiest way would be to use an ExecutorService.
You would 'offer' a new task to the executor in this way:
...
ExecutorService executor = Executors.newFixedThreadPool(POOL_SIZE);
...
Client client = getClient(); //assuming Client implements runnable
executor.submit(client);
...
public void waitForAllDone() {
executor.awaitTermination(30, TimeUnit.SECOND) ; wait termination of all threads for 30 secs
...
}
In this way, you don't waste valuable CPU cycles in busy waits or sleep/awake cycles.
See ExecutorService docs for details.

Producer-consumer problem with a twist

The producer is finite, as should be the consumer.
The problem is when to stop, not how to run.
Communication can happen over any type of BlockingQueue.
Can't rely on poisoning the queue(PriorityBlockingQueue)
Can't rely on locking the queue(SynchronousQueue)
Can't rely on offer/poll exclusively(SynchronousQueue)
Probably even more exotic queues in existence.
Creates a queued seq on another (presumably lazy) seq s. The queued
seq will produce a concrete seq in the background, and can get up to
n items ahead of the consumer. n-or-q can be an integer n buffer
size, or an instance of java.util.concurrent BlockingQueue. Note
that reading from a seque can block if the reader gets ahead of the
producer.
http://clojure.github.com/clojure/clojure.core-api.html#clojure.core/seque
My attempts so far + some tests: https://gist.github.com/934781
Solutions in Java or Clojure appreciated.
class Reader {
private final ExecutorService ex = Executors.newSingleThreadExecutor();
private final List<Object> completed = new ArrayList<Object>();
private final BlockingQueue<Object> doneQueue = new LinkedBlockingQueue<Object>();
private int pending = 0;
public synchronized Object take() {
removeDone();
queue();
Object rVal;
if(completed.isEmpty()) {
try {
rVal = doneQueue.take();
} catch (InterruptedException e) {
throw new RuntimeException(e);
}
pending--;
} else {
rVal = completed.remove(0);
}
queue();
return rVal;
}
private void removeDone() {
Object current = doneQueue.poll();
while(current != null) {
completed.add(current);
pending--;
current = doneQueue.poll();
}
}
private void queue() {
while(pending < 10) {
pending++;
ex.submit(new Runnable() {
#Override
public void run() {
doneQueue.add(compute());
}
private Object compute() {
//do actual computation here
return new Object();
}
});
}
}
}
Not exactly an answer I'm afraid, but a few remarks and more questions. My first answer would be: use clojure.core/seque. The producer needs to communicate end-of-seq somehow for the consumer to know when to stop, and I assume the number of produced elements is not known in advance. Why can't you use an EOS marker (if that's what you mean by queue poisoning)?
If I understand your alternative seque implementation correctly, it will break when elements are taken off the queue outside your function, since channel and q will be out of step in that case: channel will hold more #(.take q) elements than there are elements in q, causing it to block. There might be ways to ensure channel and q are always in step, but that would probably require implementing your own Queue class, and it adds so much complexity that I doubt it's worth it.
Also, your implementation doesn't distinguish between normal EOS and abnormal queue termination due to thread interruption - depending on what you're using it for you might want to know which is which. Personally I don't like using exceptions in this way — use exceptions for exceptional situations, not for normal flow control.

Categories

Resources