Reading from blocking queue with multiple threads - java

I have a producer-consumer model using a blocking queue where 4 threads read files from a directory puts it to the blocking queue and 4 threads(consumer) reads from blocking queue.
My problem is every time only one consumer reads from the Blockingqueue and the other 3 consumer threads are not reading:
final BlockingQueue<byte[]> queue = new LinkedBlockingQueue<>(QUEUE_SIZE);
CompletableFuture<Void> completableFutureProducer = produceUrls(files, queue, checker);
//not providing code for produceData , it is working file with all 4 //threads writing to Blocking queue. Here is the consumer code.
private CompletableFuture<Validator> consumeData(
final Response checker,
final CompletableFuture<Void> urls
) {
return CompletableFuture.supplyAsync(checker, 4)
.whenComplete((result, err) -> {
if (err != null) {
LOG.error("consuming url worker failed!", err);
urls.cancel(true);
}
});
}
completableFutureProducer.join();
completableFutureConsumer.join();
This is my code. Can someone tell me what I am doing wrong? Or help with correct code.
Why is one consumer reading from the Blocking queue.
Adding code for Response class reading from Blocking queue :
#Slf4j
public final class Response implements Supplier<Check> {
private final BlockingQueue<byte[]> data;
private final AtomicBoolean producersComplete;
private final Calendar calendar = Calendar.getInstance();
public ResponseCode(
final BlockingQueue<byte[]> data
) {
this.data = data;
producersDone = new AtomicBoolean();
}
public void notifyProducersDone() {
producersComplete.set(true);
}
#Override
public Check get() {
try {
Check check = null;
try {
while (!data.isEmpty() || !producersDone.get()) {
final byte[] item = data.poll(1, TimeUnit.SECONDS);
if (item != null) {
LOG.info("{}",new String(item));
// I see only one thread printing result here .
validator = validateData(item);
}
}
} catch (InterruptedException | IOException e) {
Thread.currentThread().interrupt();
throw new WriteException("Exception occurred while data validation", e);
}
return check;
} finally {
LOG.info("Done reading data from BlockingQueue");
}
}
}

It's hard to diagnose from this alone, but it's probably not correct to check for data.isEmpty() because the queue may happen to be temporarily empty (but later get items). So your threads might exit as soon as they encounter a temporarily empty queue.
Instead, you can exit if producers were done AND you got an empty result from the poll. That way the threads only exit when there are truly no more items to process.
It's a bit odd though that you are returning the result of the last item (alone). Are you sure this is what you want?
EDIT: I've done something very similar recently. Here is a class that reads from a file, transforms the lines in a multi-threaded way, then writes to a different file (the order of lines are preserved).
It also uses a BlockingQueue. It's very similar to your code, but it doesn't check for quue.isEmpty() for the aforementioned reason. It works fine for me.

4+4 threads is not that many, so you better do not use asynchronous tools like CompletableFuture. Simple multithreaded program would be simpler and work faster.
Having
BlockingQueue<byte[]> data;
don't use data.poll();
use data.take();

When you have lets say 1 item in the queue, and 4 consumers, one of them will poll the item rendering queue to be empty. Then 3 of the rest of the consumers checks if queue.isEmpty(), and since it is - quits the loop.

Related

Consumer(s)-Producer issue in webserver streaming an array of data

Producer-Consumer blog post states that:
"2) Producer doesn't need to know about who is consumer or how many consumers are there. Same is true with Consumer."
My problem is that I have an array of data that I need to get from the Webserver to clients as soon as possible. The clients can appear mid-calculation. Multiple clients at different times can request the array of data. Once the calculation is complete it is cached and then it can simply be read.
Exmaple Use Case: While the calculation is occurring I want to serve each and every datum of the array as soon as possible. I can't use a BlockingQueue because say if a second client starts to request the array while the first one has already used .take() on the first half of the array. Then the second client missed half the data! I need a BlockingQueue where you don't have to take(), but you could instead just read(int index).
Solution? I have a good amount of writes on my array, so I wouldn't want to use CopyOnWriteArrayList? The Vector class should work but would be inefficient?
Is it preferable to use a ThreadSafeList like this and just add a waitForElement() function? I just don't want to reinvent the wheel and I prefer crowd tested solutions for multi-threaded problems...
As far as I understand you need to broadcast data to subscribers/clients.
Here are some ways that I know for approaching it.
Pure Java solution, every client has a BlockingQueue and every time you broadcast a message you put it every queue.
for(BlockingQueue client: clients){
client.put(msg);
}
RxJava provides a reactive approach. Clients will be subscribers and ever time you emit a message, subscribers will be notified and they can choose to cancel their subscription
Observable<String> observable = Observable.create(sub->{
String[] msgs = {"msg1","msg2","msg3"};
for (String msg : msgs) {
if(!sub.isUnsubscribed()){
sub.onNext(msg);
}
}
if (!sub.isUnsubscribed()) { // completes
sub.onCompleted();
}
});
Now multiple subscribers can choose to receive messages.
observable.subscribe(System.out::println);
observable.subscribe(System.out::println);
Observables are a bit functional, they can choose what they need.
observable.filter(msg-> msg.equals("msg2")).map(String::length)
.subscribe(msgLength->{
System.out.println(msgLength); // or do something useful
});
Akka provides broadcast routers
This is not exactly a trivial problem; but not too hard to solve either.
Assuming your producer is an imperative program; it generates data chunk by chunk, adding each chunk to the cache; the process terminates either successfully or with an error.
The cache should have this interface for the produce to push data in it
public class Cache
public void add(byte[] bytes)
public void finish(boolean error)
Each consumer obtains a new view from the cache; the view is a blocking data source
public class Cache
public View newView()
public class View
// return null for EOF
public byte[] read() throws Exception
Here's a straightforward implementation
public class Cache
{
final Object lock = new Object();
int state = INIT;
static final int INIT=0, DONE=1, ERROR=2;
ArrayList<byte[]> list = new ArrayList<>();
public void add(byte[] bytes)
{
synchronized (lock)
{
list.add(bytes);
lock.notifyAll();
}
}
public void finish(boolean error)
{
synchronized (lock)
{
state = error? ERROR : DONE;
lock.notifyAll();
}
}
public View newView()
{
return new View();
}
public class View
{
int index;
// return null for EOF
public byte[] read() throws Exception
{
synchronized (lock)
{
while(state==INIT && index==list.size())
lock.wait();
if(state==ERROR)
throw new Exception();
if(index<list.size())
return list.get(index++);
assert state==DONE && index==list.size();
return null;
}
}
}
}
It can be optimized a little; most importantly, after state=DONE, consumers should not need synchronized; a simple volatile read is enough, which can be achieved by a volatile state

Mix explicit and implicit parallelism with java-8 streams

in the past I have written some java programs, using two threads.
First thread (producer) was reading data from an API (C library), create a java object, send the object to the other thread.
The C API is delivering an event stream (infinite).
The threads are using a LinkedBlockingQueue as a pipeline to exchange the objects (put, poll).
The second thread (consumer) is dealing with the object.
(I also found that code is more readable within the threads. First thread is dealing with the C API stuff and producing
proper java objects, second thread is free from C API handling and is dealing with the data).
Now I'm interested, how I can realize this scenario above with the new stream API coming in java 8.
But assuming I want to keep the two threads (producer/consumer)!
First thread is writing into the stream. Second thread is reading from the stream.
I also hope, that I can handle with this technique a better explicit parallelism (producer/consumer)
and within the stream I can use some implicit parallelism (e.g. stream.parallel()).
I don't have many experience with the new stream api.
So I experimented with the following code below, to solve the idea above.
I use 'generate' to access the C API and feed this to the java stream.
I used in the consumer thread .parallel() to test and handle implicit parallelism. Looks fine. But see below.
Questions:
Is 'generate' the best way in this scenario for the producer?
I have an understanding problem how to terminate/close the stream in the producer,
if the API has some errors AND I want to shutdown the whole pipeline.
Do I use stream.close or throw an exception?
2.1 I used stream.close(). But 'generate' is still running after closing,
I found only to throw an exception to terminate the generate part.
This exception is going into the stream and consumer is receiving the exception
(This is fine for me, consumer can recognize it and terminate).
But in this case, the producer has produced more then consumer has processed, while exception is arriving.
2.2 if consumer is using implicit parallelism stream.parallel(). The producer is processing much more items.
So I don't see any solution for this problem. (Accessing C API, check error, make decision).
2.3 Throwing the exception in producer arrives at consumer stream, but not all inserted objects are processed.
Once more: the idea is to have an explicit parallelism with the threads.
But internally I can deal with the new features and use parallel processing when possible
Thanks for breeding about this problem too.
package sandbox.test;
import java.util.concurrent.atomic.AtomicInteger;
import java.util.stream.LongStream;
public class MyStream {
private volatile LongStream stream = null;
private AtomicInteger producerCount = new AtomicInteger(0);
private AtomicInteger consumerCount = new AtomicInteger(0);
private AtomicInteger apiError = new AtomicInteger(0);
public static void main(String[] args) throws InterruptedException {
MyStream appl = new MyStream();
appl.create();
}
private static void sleep(long sleep) {
try {
Thread.sleep(sleep);
} catch (InterruptedException e) {
throw new RuntimeException(e);
}
}
private static void apiError(final String pos, final int iteration) {
RuntimeException apiException = new RuntimeException("API error pos=" + pos + " iteration=" + iteration);
System.out.println(apiException.getMessage());
throw apiException;
}
final private int simulateErrorAfter = 10;
private Thread produce() {
Thread thread = new Thread(new Runnable() {
#Override
public void run() {
System.out.println("Producer started");
stream = LongStream.generate(() -> {
int localCount;
// Detect error, while using stream.parallel() processing
int error = apiError.get();
if ( error > 0 )
apiError("1", error);
// ----- Accessing the C API here -----
localCount = producerCount.incrementAndGet(); // C API access; delegate for accessing the C API
// ----- Accessing the C API here -----
// Checking error code from C API
if ( localCount > simulateErrorAfter ) { // Simulate an API error
producerCount.decrementAndGet();
stream.close();
apiError("2", apiError.incrementAndGet());
}
System.out.println("P: " + localCount);
sleep(200L);
return localCount;
});
System.out.println("Producer terminated");
}
});
thread.start();
return thread;
}
private Thread consume() {
Thread thread = new Thread(new Runnable() {
#Override
public void run() {
try {
stream.onClose(new Runnable() {
#Override
public void run() {
System.out.println("Close detected");
}
}).parallel().forEach(l -> {
sleep(1000);
System.out.println("C: " + l);
consumerCount.incrementAndGet();
});
} catch (Exception e) {
// Capturing the stream end
System.out.println(e);
}
System.out.println("Consumer terminated");
}
});
thread.start();
return thread;
}
private void create() throws InterruptedException {
Thread producer = produce();
while ( stream == null )
sleep(10);
Thread consumer = consume();
producer.join();
consumer.join();
System.out.println("Produced: " + producerCount);
System.out.println("Consumed: " + consumerCount);
}
}
You need to understand some fundamental points about the Stream API:
All operations applied on a stream are lazy and won’t do anything before the terminal operation will be applied. There is no sense in creating the stream using a “producer” thread as this thread won’t do anything. All actions are performed within your “consumer” thread and the background threads started by the Stream implementation itself. The thread that created the Stream instance is completely irrelevant
Closing a stream has no relevance for the Stream operation itself, i.e. does not shut down threads. It is meant to release additional resources, e.g. closing the file associated with the stream returned by Files.lines(…). You can schedule such cleanup actions using onClose and the Stream will invoke them when you call close but that’s it. For the Stream class itself it has no meaning.
Streams do not model a scenario like “one thread is writing and another one is reading”. Their model is “one thread is calling your Supplier, followed by calling your Consumer and another thread does the same, and x other threads too…”
If you want to implement a producer/consumer scheme with distinct producer and consumer threads, you are better off using Threads or an ExecutorService and a thread-safe queue.
But you still can use Java 8 features. E.g. there is no need to implement Runnables using inner classes; you can use lambda expression for them.

Process Large File for HTTP Calls in Java

I have a file with millions of lines in it that I need to process. Each line of the file will result in an HTTP call. I'm trying to figure out the best way to attack the problem.
I obviously could just read the file and make the calls sequentially, but it would be incredibly slow. I'd like to parallelize the calls, but I'm not sure if I should read the entire file into memory (something I'm not a huge fan of) or try to parallelize the reading of the file as well (which I'm not sure would make sense).
Just looking for some thoughts here on the best way to attack the problem. If there is an existing framework or library that does something similar I'm happy to use that as well.
Thanks.
I'd like to parallelize the calls, but I'm not sure if I should read the entire file into memory
You should used an ExecutorService with a bounded BlockingQueue. As you read in your million lines you submit jobs to the thread-pool until the BlockingQueue is full. This way you will be able to run 100 (or whatever number is optimal) of HTTP requests simultaneously without having to read all of the lines of the file beforehand.
You'll need to set up a RejectedExecutionHandler that blocks if the queue is full. This is better than a caller runs handler.
BlockingQueue<Runnable> queue = new ArrayBlockingQueue<Runnable>(100);
// NOTE: you want the min and max thread numbers here to be the same value
ThreadPoolExecutor threadPool =
new ThreadPoolExecutor(nThreads, nThreads, 0L, TimeUnit.MILLISECONDS, queue);
// we need our RejectedExecutionHandler to block if the queue is full
threadPool.setRejectedExecutionHandler(new RejectedExecutionHandler() {
#Override
public void rejectedExecution(Runnable r, ThreadPoolExecutor executor) {
try {
// this will block the producer until there's room in the queue
executor.getQueue().put(r);
} catch (InterruptedException e) {
throw new RejectedExecutionException(
"Unexpected InterruptedException", e);
}
}
});
// now read in the urls
while ((String url = urlReader.readLine()) != null) {
// submit them to the thread-pool. this may block.
threadPool.submit(new DownloadUrlRunnable(url));
}
// after we submit we have to shutdown the pool
threadPool.shutdown();
// wait for them to complete
threadPool.awaitTermination(Long.MAX_VALUE, TimeUnit.MILLISECONDS);
...
private class DownloadUrlRunnable implements Runnable {
private final String url;
public DownloadUrlRunnable(String url) {
this.url = url;
}
public void run() {
// download the URL
}
}
Gray's approach seems to be good. The other approach I would suggest is to split the files into chunks (you will have to write the logic), and process those with multiple threads.

Producer-consumer problem with a twist

The producer is finite, as should be the consumer.
The problem is when to stop, not how to run.
Communication can happen over any type of BlockingQueue.
Can't rely on poisoning the queue(PriorityBlockingQueue)
Can't rely on locking the queue(SynchronousQueue)
Can't rely on offer/poll exclusively(SynchronousQueue)
Probably even more exotic queues in existence.
Creates a queued seq on another (presumably lazy) seq s. The queued
seq will produce a concrete seq in the background, and can get up to
n items ahead of the consumer. n-or-q can be an integer n buffer
size, or an instance of java.util.concurrent BlockingQueue. Note
that reading from a seque can block if the reader gets ahead of the
producer.
http://clojure.github.com/clojure/clojure.core-api.html#clojure.core/seque
My attempts so far + some tests: https://gist.github.com/934781
Solutions in Java or Clojure appreciated.
class Reader {
private final ExecutorService ex = Executors.newSingleThreadExecutor();
private final List<Object> completed = new ArrayList<Object>();
private final BlockingQueue<Object> doneQueue = new LinkedBlockingQueue<Object>();
private int pending = 0;
public synchronized Object take() {
removeDone();
queue();
Object rVal;
if(completed.isEmpty()) {
try {
rVal = doneQueue.take();
} catch (InterruptedException e) {
throw new RuntimeException(e);
}
pending--;
} else {
rVal = completed.remove(0);
}
queue();
return rVal;
}
private void removeDone() {
Object current = doneQueue.poll();
while(current != null) {
completed.add(current);
pending--;
current = doneQueue.poll();
}
}
private void queue() {
while(pending < 10) {
pending++;
ex.submit(new Runnable() {
#Override
public void run() {
doneQueue.add(compute());
}
private Object compute() {
//do actual computation here
return new Object();
}
});
}
}
}
Not exactly an answer I'm afraid, but a few remarks and more questions. My first answer would be: use clojure.core/seque. The producer needs to communicate end-of-seq somehow for the consumer to know when to stop, and I assume the number of produced elements is not known in advance. Why can't you use an EOS marker (if that's what you mean by queue poisoning)?
If I understand your alternative seque implementation correctly, it will break when elements are taken off the queue outside your function, since channel and q will be out of step in that case: channel will hold more #(.take q) elements than there are elements in q, causing it to block. There might be ways to ensure channel and q are always in step, but that would probably require implementing your own Queue class, and it adds so much complexity that I doubt it's worth it.
Also, your implementation doesn't distinguish between normal EOS and abnormal queue termination due to thread interruption - depending on what you're using it for you might want to know which is which. Personally I don't like using exceptions in this way — use exceptions for exceptional situations, not for normal flow control.

How should I handle Multi-threading in Java?

I am working on a practical scenario related with Java;a socket program. The existing system and the expected system are as follows.
Existing System - The system checks that a certain condition is satisfied. If so It will create some message to be sent and put it into a queue.
The queue processor is a separate thread. It periodically check the queue for existence of items in it. If found any items (messages) it just sends the message to a remote host (hardcoded) and remove the item from queue.
Expected System - This is something like that. The message is created when a certain condition is satisfied but in every case the recipient is not same. So there are many approaches.
putting the message into the same queue but with its receiver ID. In this case the 2nd thread can identify the receiver so the message can be sent to that.
Having multiple threads. In this case when the condition is satisfied and if the receiver in "New" it creates a new queue and put the message into that queue. And a new thread initializes to process that queue. If the next messages are directed to same recipient it should put to the same queue and if not a new queue and the thread should be created.
Now I want to implement the 2nd one, bit stucked. How should I do that? A skeleton would be sufficient and you won't need to worry to put how to create queues etc... :)
Update : I also think that the approach 1 is the best way to do that. I read some articles on threading and came to that decision. But it is really worth to learn how to implement the approach 2 as well.
Consider using Java Message Services (JMS) rather than re-inventing the wheel?
Can I suggest that you look at BlockingQueue ? Your dispatch process can write to this queue (put), and clients can take or peek in a threadsafe manner. So you don't need to write the queue implementation at all.
If you have one queue containing different message types, then you will need to implement some peek-type mechanism for each client (i.e. they will have to check the head of the queue and only take what is theirs). To work effectively then consumers will have to extract data required for them in a timely and robust fashion.
If you have one queue/thread per message/consumer type, then that's going to be easier/more reliable.
Your client implementation will simply have to loop on:
while (!done) {
Object item = queue.take();
// process item
}
Note that the queue can make use of generics, and take() is blocking.
Of course, with multiple consumers taking messages of different types, you may want to consider a space-based architecture. This won't have queue (FIFO) characteristics, but will allow you multiple consumers in a very easy fashion.
You have to weigh up slightly whether you have lots of end machines and occasional messages to each, or a few end machines and frequent messages to each.
If you have lots of end machines, then literally having one thread per end machine sounds a bit over the top unless you're really going to be constantly streaming messages to all of those machines. I would suggest having a pool of threads which will only grow between certain bounds. To do this, you could use a ThreadPoolExecutor. When you need to post a message, you actually submit a runnable to the executor which will send the message:
Executor msgExec = new ThreadPoolExecutor(...);
public void sendMessage(final String machineId, byte[] message) {
msgExec.execute(new Runnable() {
public void run() {
sendMessageNow(machineId, message);
}
});
}
private void sendMessageNow(String machineId, byte[] message) {
// open connection to machine and send message, thinking
// about the case of two simultaneous messages to a machine,
// and whether you want to cache connections.
}
If you just have a few end machines, then you could have a BlockingQueue per machine, and a thread per blocking queue sitting waiting for the next message. In this case, the pattern is more like this (beware untested off-top-of-head Sunday morning code):
ConcurrentHashMap<String,BockingQueue> queuePerMachine;
public void sendMessage(String machineId, byte[] message) {
BockingQueue<Message> q = queuePerMachine.get(machineId);
if (q == null) {
q = new BockingQueue<Message>();
BockingQueue<Message> prev = queuePerMachine.putIfAbsent(machineId, q);
if (prev != null) {
q = prev;
} else {
(new QueueProessor(q)).start();
}
}
q.put(new Message(message));
}
private class QueueProessor extends Thread {
private final BockingQueue<Message> q;
QueueProessor(BockingQueue<Message> q) {
this.q = q;
}
public void run() {
Socket s = null;
for (;;) {
boolean needTimeOut = (s != null);
Message m = needTimeOut ?
q.poll(60000, TimeUnit.MILLISECOND) :
q.take();
if (m == null) {
if (s != null)
// close s and null
} else {
if (s == null) {
// open s
}
// send message down s
}
}
// add appropriate error handling and finally
}
}
In this case, we close the connection if no message for that machine arrives within 60 seconds.
Should you use JMS instead? Well, you have to weigh up whether this sounds complicated to you. My personal feeling is it isn't a complicated enough a task to warrant a special framework. But I'm sure opinions differ.
P.S. In reality, now I look at this, you'd probably put the queue inside the thread object and just map machine ID -> thread object. Anyway, you get the idea.
You might try using SomnifugiJMS, an in-vm JMS implementation using java.util.concurrent as the actual "engine" of sorts.
It will probably be somewhat overkill for your purposes, but may well enable your application to be distributed for little to no additional programming (if applicable), you just plug in a different JMS implementation like ActiveMQ and you're done.
First of all, if you are planning to have a lot of receivers, I would not use the ONE-THREAD-AND-QUEUE-PER-RECEIVER approach. You could end up with a lot of threads not doing anything most of the time and I could hurt you performance wide. An alternative is using a thread pool of worker threads, just picking tasks from a shared queue, each task with its own receiver ID, and perhaps, a shared dictionary with socket connections to each receiver for the working threads to use.
Having said so, if you still want to pursue your approach what you could do is:
1) Create a new class to handle your new thread execution:
public class Worker implements Runnable {
private Queue<String> myQueue = new Queue<String>();
public void run()
{
while (true) {
string messageToProcess = null;
synchronized (myQueue) {
if (!myQueue.empty()) {
// get your data from queue
messageToProcess = myQueue.pop();
}
}
if (messageToProcess != null) {
// do your stuff
}
Thread.sleep(500); // to avoid spinning
}
}
public void queueMessage(String message)
{
synchronized(myQueue) {
myQueue.add(message);
}
}
}
2) On your main thread, create the messages and use a dictionary (hash table) to see if the receiver's threads is already created. If is is, the just queue the new message. If not, create a new thread, put it in the hashtable and queue the new message:
while (true) {
String msg = getNewCreatedMessage(); // you get your messages from here
int id = getNewCreatedMessageId(); // you get your rec's id from here
Worker w = myHash(id);
if (w == null) { // create new Worker thread
w = new Worker();
new Thread(w).start();
}
w.queueMessage(msg);
}
Good luck.
Edit: you can improve this solution by using BlockingQueue Brian mentioned with this approach.

Categories

Resources