Consumer(s)-Producer issue in webserver streaming an array of data

Consumer(s)-Producer issue in webserver streaming an array of data - java

Producer-Consumer blog post states that:
"2) Producer doesn't need to know about who is consumer or how many consumers are there. Same is true with Consumer."
My problem is that I have an array of data that I need to get from the Webserver to clients as soon as possible. The clients can appear mid-calculation. Multiple clients at different times can request the array of data. Once the calculation is complete it is cached and then it can simply be read.
Exmaple Use Case: While the calculation is occurring I want to serve each and every datum of the array as soon as possible. I can't use a BlockingQueue because say if a second client starts to request the array while the first one has already used .take() on the first half of the array. Then the second client missed half the data! I need a BlockingQueue where you don't have to take(), but you could instead just read(int index).
Solution? I have a good amount of writes on my array, so I wouldn't want to use CopyOnWriteArrayList? The Vector class should work but would be inefficient?
Is it preferable to use a ThreadSafeList like this and just add a waitForElement() function? I just don't want to reinvent the wheel and I prefer crowd tested solutions for multi-threaded problems...

As far as I understand you need to broadcast data to subscribers/clients.
Here are some ways that I know for approaching it.
Pure Java solution, every client has a BlockingQueue and every time you broadcast a message you put it every queue.
for(BlockingQueue client: clients){
client.put(msg);
}
RxJava provides a reactive approach. Clients will be subscribers and ever time you emit a message, subscribers will be notified and they can choose to cancel their subscription
Observable<String> observable = Observable.create(sub->{
String[] msgs = {"msg1","msg2","msg3"};
for (String msg : msgs) {
if(!sub.isUnsubscribed()){
sub.onNext(msg);
}
}
if (!sub.isUnsubscribed()) { // completes
sub.onCompleted();
}
});
Now multiple subscribers can choose to receive messages.
observable.subscribe(System.out::println);
observable.subscribe(System.out::println);
Observables are a bit functional, they can choose what they need.
observable.filter(msg-> msg.equals("msg2")).map(String::length)
.subscribe(msgLength->{
System.out.println(msgLength); // or do something useful
});
Akka provides broadcast routers

This is not exactly a trivial problem; but not too hard to solve either.
Assuming your producer is an imperative program; it generates data chunk by chunk, adding each chunk to the cache; the process terminates either successfully or with an error.
The cache should have this interface for the produce to push data in it
public class Cache
public void add(byte[] bytes)
public void finish(boolean error)
Each consumer obtains a new view from the cache; the view is a blocking data source
public class Cache
public View newView()
public class View
// return null for EOF
public byte[] read() throws Exception
Here's a straightforward implementation
public class Cache
{
final Object lock = new Object();
int state = INIT;
static final int INIT=0, DONE=1, ERROR=2;
ArrayList<byte[]> list = new ArrayList<>();
public void add(byte[] bytes)
{
synchronized (lock)
{
list.add(bytes);
lock.notifyAll();
}
}
public void finish(boolean error)
{
synchronized (lock)
{
state = error? ERROR : DONE;
lock.notifyAll();
}
}
public View newView()
{
return new View();
}
public class View
{
int index;
// return null for EOF
public byte[] read() throws Exception
{
synchronized (lock)
{
while(state==INIT && index==list.size())
lock.wait();
if(state==ERROR)
throw new Exception();
if(index<list.size())
return list.get(index++);
assert state==DONE && index==list.size();
return null;
}
}
}
}
It can be optimized a little; most importantly, after state=DONE, consumers should not need synchronized; a simple volatile read is enough, which can be achieved by a volatile state

Related

Reading from blocking queue with multiple threads

I have a producer-consumer model using a blocking queue where 4 threads read files from a directory puts it to the blocking queue and 4 threads(consumer) reads from blocking queue.
My problem is every time only one consumer reads from the Blockingqueue and the other 3 consumer threads are not reading:
final BlockingQueue<byte[]> queue = new LinkedBlockingQueue<>(QUEUE_SIZE);
CompletableFuture<Void> completableFutureProducer = produceUrls(files, queue, checker);
//not providing code for produceData , it is working file with all 4 //threads writing to Blocking queue. Here is the consumer code.
private CompletableFuture<Validator> consumeData(
final Response checker,
final CompletableFuture<Void> urls
) {
return CompletableFuture.supplyAsync(checker, 4)
.whenComplete((result, err) -> {
if (err != null) {
LOG.error("consuming url worker failed!", err);
urls.cancel(true);
}
});
}
completableFutureProducer.join();
completableFutureConsumer.join();
This is my code. Can someone tell me what I am doing wrong? Or help with correct code.
Why is one consumer reading from the Blocking queue.
Adding code for Response class reading from Blocking queue :
#Slf4j
public final class Response implements Supplier<Check> {
private final BlockingQueue<byte[]> data;
private final AtomicBoolean producersComplete;
private final Calendar calendar = Calendar.getInstance();
public ResponseCode(
final BlockingQueue<byte[]> data
) {
this.data = data;
producersDone = new AtomicBoolean();
}
public void notifyProducersDone() {
producersComplete.set(true);
}
#Override
public Check get() {
try {
Check check = null;
try {
while (!data.isEmpty() || !producersDone.get()) {
final byte[] item = data.poll(1, TimeUnit.SECONDS);
if (item != null) {
LOG.info("{}",new String(item));
// I see only one thread printing result here .
validator = validateData(item);
}
}
} catch (InterruptedException | IOException e) {
Thread.currentThread().interrupt();
throw new WriteException("Exception occurred while data validation", e);
}
return check;
} finally {
LOG.info("Done reading data from BlockingQueue");
}
}
}

It's hard to diagnose from this alone, but it's probably not correct to check for data.isEmpty() because the queue may happen to be temporarily empty (but later get items). So your threads might exit as soon as they encounter a temporarily empty queue.
Instead, you can exit if producers were done AND you got an empty result from the poll. That way the threads only exit when there are truly no more items to process.
It's a bit odd though that you are returning the result of the last item (alone). Are you sure this is what you want?
EDIT: I've done something very similar recently. Here is a class that reads from a file, transforms the lines in a multi-threaded way, then writes to a different file (the order of lines are preserved).
It also uses a BlockingQueue. It's very similar to your code, but it doesn't check for quue.isEmpty() for the aforementioned reason. It works fine for me.

4+4 threads is not that many, so you better do not use asynchronous tools like CompletableFuture. Simple multithreaded program would be simpler and work faster.
Having
BlockingQueue<byte[]> data;
don't use data.poll();
use data.take();

When you have lets say 1 item in the queue, and 4 consumers, one of them will poll the item rendering queue to be empty. Then 3 of the rest of the consumers checks if queue.isEmpty(), and since it is - quits the loop.

Is unsubscribe thread safe in RxJava?

Suppose I have the following RxJava code (which accesses a DB, but the exact use case is irrelevant):
public Observable<List<DbPlaceDto>> getPlaceByStringId(final List<String> stringIds) {
return Observable.create(new Observable.OnSubscribe<List<DbPlaceDto>>() {
#Override
public void call(Subscriber<? super List<DbPlaceDto>> subscriber) {
try {
Cursor c = getPlacseDb(stringIds);
List<DbPlaceDto> dbPlaceDtoList = new ArrayList<>();
while (c.moveToNext()) {
dbPlaceDtoList.add(getDbPlaceDto(c));
}
c.close();
if (!subscriber.isUnsubscribed()) {
subscriber.onNext(dbPlaceDtoList);
subscriber.onCompleted();
}
} catch (Exception e) {
if (!subscriber.isUnsubscribed()) {
subscriber.onError(e);
}
}
}
});
}
Given this code, I have the following questions:
If someone unsubscribes from the observable returned from this method (after a previous subscription), is that operation thread-safe? So are my 'isUnsubscribed()' checks correct in this sense, regardless of scheduling?
Is there a cleaner way with less boilerplate code to check for unsubscribed states than what I'm using here? I couldn't find anything in the framework. I thought SafeSubscriber solves the issue of not forwarding events when the subscriber is unsubscribed, but apparently it does not.

is that operation thread-safe?
Yes. You are receiving an rx.Subscriber which (eventually) checks against a volatile boolean that is set to true when the subscriber's subscription is unsubscribed.
cleaner way with less boilerplate code to check for unsubscribed states
The SyncOnSubscribe and the AsyncOnSubscribe (available as an #Experimental api as of release 1.0.15) was created for this use case. They function as a safe alternative to calling Observable.create. Here is a (contrived) example of the synchronous case.
public static class FooState {
public Integer next() {
return 1;
}
public void shutdown() {
}
public FooState nextState() {
return new FooState();
}
}
public static void main(String[] args) {
OnSubscribe<Integer> sos = SyncOnSubscribe.createStateful(FooState::new,
(state, o) -> {
o.onNext(state.next());
return state.nextState();
},
state -> state.shutdown() );
Observable<Integer> obs = Observable.create(sos);
}
Note that the SyncOnSubscribe next function is not allowed to call observer.onNext more than once per iteration nor can it call into that observer concurrently. Here are a couple of links to the SyncOnSubscribe implementation and tests on the head of the 1.x branch. It's primary usage is to simplify writing observables that iterate or parsing over data synchronously and onNext downstream but doing so in a framework that supports back-pressure and checks if unsubscribed. Essentially you would create a next function which would get invoked every time the downstream operators need a new data element onNexted. Your next function can call onNext either 0 or 1 time.
The AsyncOnSubscribe is designed to play nicely with back pressure for observable sources that operate asynchronously (such as off-box calls). The arguments to your next function include the request count and your provided observable should provide an observable that fulfills data up to that requested amount. An example of this behavior would be paginated queries from an external datasource.
Previously it was a safe practice to transform your OnSubscribe to an Iterable and use Observable.from(Iterable). This implementation gets an iterator and checks subscriber.isUnsubscribed() for you.

Knowing when akka actors are finished

There are a few people working on a project along with me that have been trying to figure out the best way to deal with this issue. It seems this should be a standard thing wanted regularly, but for some reason we can't seem to get the right answer.
If I have some work to be done and I throw a bunch of messages at a router, how can I tell when all the work is done? For example, if we're reading lines of a 1 million line file and sending the line off to actors to process this, and you need to process the next file, but must wait for the first to complete, how can you know when it is complete?
One further comment. I'm aware and have used Await.result() and Await.ready() used with Patters.ask(). One difference is, each line would have a Future and we'd have a HUGE array of these futures to wait on, not just one. Additionally, we are populating a large domain model taking up considerable memory, and do not wish to add additional memory for holding an equal number of futures in memory waiting to be composed, while using actors each one completes after doing it's work not holding memory waiting to be composed.
We're using Java and not Scala.
Pseudo code:
for(File file : files) {
...
while((String line = getNextLine(fileStream)) != null) {
router.tell(line, this.getSelf());
}
// we need to wait for this work to finish to do the next
// file because it's dependent on the previous work
}
It would seem you'd often want to do a lot of work and know when it's finished with actors.

I believe I have a solution for you and it does not involve accumulating a whole bunch of Futures. First, the high level concept. There will be two actors participating in this flow. The first we'll call FilesProcessor. This actor will be short lived and stateful. Whenever you want to process a bunch of files sequentially, you spin up an instance of this actor and pass it a message containing the names (or paths) of the files you want to process. When it has completed processing of all of the files, it stops itself. The second actor we will call LineProcessor. This actor is stateless, long lived and pooled behind a router. It processes a file line and then responds back to whoever requested the line processing telling them it has completed processing that line. Now onto the code.
First the messages:
public class Messages {
public static class ProcessFiles{
public final List<String> fileNames;
public ProcessFiles(List<String> fileNames){
this.fileNames = fileNames;
}
}
public static class ProcessLine{
public final String line;
public ProcessLine(String line){
this.line = line;
}
}
public static class LineProcessed{}
public static LineProcessed LINE_PROCESSED = new LineProcessed();
}
And the FilesProcessor:
public class FilesProcessor extends UntypedActor{
private List<String> files;
private int awaitingCount;
private ActorRef router;
#Override
public void onReceive(Object msg) throws Exception {
if (msg instanceof ProcessFiles){
ProcessFiles pf = (ProcessFiles)msg;
router = ... //lookup router;
files = pf.fileNames;
processNextFile();
}
else if (msg instanceof LineProcessed){
awaitingCount--;
if (awaitingCount <= 0){
processNextFile();
}
}
}
private void processNextFile(){
if (files.isEmpty()) getContext().stop(getSelf());
else{
String file = files.remove(0);
BufferedReader in = openFile(file);
String input = null;
awaitingCount = 0;
try{
while((input = in.readLine()) != null){
router.tell(new Messages.ProcessLine(input), getSelf());
awaitingCount++;
}
}
catch(IOException e){
e.printStackTrace();
getContext().stop(getSelf());
}
}
}
private BufferedReader openFile(String name){
//do whetever to load file
...
}
}
And the LineProcessor:
public class LineProcessor extends UntypedActor{
#Override
public void onReceive(Object msg) throws Exception {
if (msg instanceof ProcessLine){
ProcessLine pl = (ProcessLine)msg;
//Do whatever line processing...
getSender().tell(Messages.LINE_PROCESSED, getSelf());
}
}
}
Now the line processor is sending a response back with no additional content. You could certainly change this if you needed to send something back based on the processing of the line. I'm sure this code is not bullet proof, I just wanted to show you a high level concept for how you could accomplish this flow without request/response semantics and Futures.
If you have any questions on this approach or want more detail, let me know and I'd be happy to provide it.

Use context.setRecieveTimeout on the routees to send back a message back to the sender with a count of the messages processed. When the total messages processed == the amount sent you are finished.
If your routees are going to stay busy enough that setReceiveTimeout won't fire often enough then schedule your own messages to send the counts back.

Producer-consumer problem with a twist

The producer is finite, as should be the consumer.
The problem is when to stop, not how to run.
Communication can happen over any type of BlockingQueue.
Can't rely on poisoning the queue(PriorityBlockingQueue)
Can't rely on locking the queue(SynchronousQueue)
Can't rely on offer/poll exclusively(SynchronousQueue)
Probably even more exotic queues in existence.
Creates a queued seq on another (presumably lazy) seq s. The queued
seq will produce a concrete seq in the background, and can get up to
n items ahead of the consumer. n-or-q can be an integer n buffer
size, or an instance of java.util.concurrent BlockingQueue. Note
that reading from a seque can block if the reader gets ahead of the
producer.
http://clojure.github.com/clojure/clojure.core-api.html#clojure.core/seque
My attempts so far + some tests: https://gist.github.com/934781
Solutions in Java or Clojure appreciated.

class Reader {
private final ExecutorService ex = Executors.newSingleThreadExecutor();
private final List<Object> completed = new ArrayList<Object>();
private final BlockingQueue<Object> doneQueue = new LinkedBlockingQueue<Object>();
private int pending = 0;
public synchronized Object take() {
removeDone();
queue();
Object rVal;
if(completed.isEmpty()) {
try {
rVal = doneQueue.take();
} catch (InterruptedException e) {
throw new RuntimeException(e);
}
pending--;
} else {
rVal = completed.remove(0);
}
queue();
return rVal;
}
private void removeDone() {
Object current = doneQueue.poll();
while(current != null) {
completed.add(current);
pending--;
current = doneQueue.poll();
}
}
private void queue() {
while(pending < 10) {
pending++;
ex.submit(new Runnable() {
#Override
public void run() {
doneQueue.add(compute());
}
private Object compute() {
//do actual computation here
return new Object();
}
});
}
}
}

Not exactly an answer I'm afraid, but a few remarks and more questions. My first answer would be: use clojure.core/seque. The producer needs to communicate end-of-seq somehow for the consumer to know when to stop, and I assume the number of produced elements is not known in advance. Why can't you use an EOS marker (if that's what you mean by queue poisoning)?
If I understand your alternative seque implementation correctly, it will break when elements are taken off the queue outside your function, since channel and q will be out of step in that case: channel will hold more #(.take q) elements than there are elements in q, causing it to block. There might be ways to ensure channel and q are always in step, but that would probably require implementing your own Queue class, and it adds so much complexity that I doubt it's worth it.
Also, your implementation doesn't distinguish between normal EOS and abnormal queue termination due to thread interruption - depending on what you're using it for you might want to know which is which. Personally I don't like using exceptions in this way — use exceptions for exceptional situations, not for normal flow control.

How should I handle Multi-threading in Java?

I am working on a practical scenario related with Java;a socket program. The existing system and the expected system are as follows.
Existing System - The system checks that a certain condition is satisfied. If so It will create some message to be sent and put it into a queue.
The queue processor is a separate thread. It periodically check the queue for existence of items in it. If found any items (messages) it just sends the message to a remote host (hardcoded) and remove the item from queue.
Expected System - This is something like that. The message is created when a certain condition is satisfied but in every case the recipient is not same. So there are many approaches.
putting the message into the same queue but with its receiver ID. In this case the 2nd thread can identify the receiver so the message can be sent to that.
Having multiple threads. In this case when the condition is satisfied and if the receiver in "New" it creates a new queue and put the message into that queue. And a new thread initializes to process that queue. If the next messages are directed to same recipient it should put to the same queue and if not a new queue and the thread should be created.
Now I want to implement the 2nd one, bit stucked. How should I do that? A skeleton would be sufficient and you won't need to worry to put how to create queues etc... :)
Update : I also think that the approach 1 is the best way to do that. I read some articles on threading and came to that decision. But it is really worth to learn how to implement the approach 2 as well.

Consider using Java Message Services (JMS) rather than re-inventing the wheel?

Can I suggest that you look at BlockingQueue ? Your dispatch process can write to this queue (put), and clients can take or peek in a threadsafe manner. So you don't need to write the queue implementation at all.
If you have one queue containing different message types, then you will need to implement some peek-type mechanism for each client (i.e. they will have to check the head of the queue and only take what is theirs). To work effectively then consumers will have to extract data required for them in a timely and robust fashion.
If you have one queue/thread per message/consumer type, then that's going to be easier/more reliable.
Your client implementation will simply have to loop on:
while (!done) {
Object item = queue.take();
// process item
}
Note that the queue can make use of generics, and take() is blocking.
Of course, with multiple consumers taking messages of different types, you may want to consider a space-based architecture. This won't have queue (FIFO) characteristics, but will allow you multiple consumers in a very easy fashion.

You have to weigh up slightly whether you have lots of end machines and occasional messages to each, or a few end machines and frequent messages to each.
If you have lots of end machines, then literally having one thread per end machine sounds a bit over the top unless you're really going to be constantly streaming messages to all of those machines. I would suggest having a pool of threads which will only grow between certain bounds. To do this, you could use a ThreadPoolExecutor. When you need to post a message, you actually submit a runnable to the executor which will send the message:
Executor msgExec = new ThreadPoolExecutor(...);
public void sendMessage(final String machineId, byte[] message) {
msgExec.execute(new Runnable() {
public void run() {
sendMessageNow(machineId, message);
}
});
}
private void sendMessageNow(String machineId, byte[] message) {
// open connection to machine and send message, thinking
// about the case of two simultaneous messages to a machine,
// and whether you want to cache connections.
}
If you just have a few end machines, then you could have a BlockingQueue per machine, and a thread per blocking queue sitting waiting for the next message. In this case, the pattern is more like this (beware untested off-top-of-head Sunday morning code):
ConcurrentHashMap<String,BockingQueue> queuePerMachine;
public void sendMessage(String machineId, byte[] message) {
BockingQueue<Message> q = queuePerMachine.get(machineId);
if (q == null) {
q = new BockingQueue<Message>();
BockingQueue<Message> prev = queuePerMachine.putIfAbsent(machineId, q);
if (prev != null) {
q = prev;
} else {
(new QueueProessor(q)).start();
}
}
q.put(new Message(message));
}
private class QueueProessor extends Thread {
private final BockingQueue<Message> q;
QueueProessor(BockingQueue<Message> q) {
this.q = q;
}
public void run() {
Socket s = null;
for (;;) {
boolean needTimeOut = (s != null);
Message m = needTimeOut ?
q.poll(60000, TimeUnit.MILLISECOND) :
q.take();
if (m == null) {
if (s != null)
// close s and null
} else {
if (s == null) {
// open s
}
// send message down s
}
}
// add appropriate error handling and finally
}
}
In this case, we close the connection if no message for that machine arrives within 60 seconds.
Should you use JMS instead? Well, you have to weigh up whether this sounds complicated to you. My personal feeling is it isn't a complicated enough a task to warrant a special framework. But I'm sure opinions differ.
P.S. In reality, now I look at this, you'd probably put the queue inside the thread object and just map machine ID -> thread object. Anyway, you get the idea.

You might try using SomnifugiJMS, an in-vm JMS implementation using java.util.concurrent as the actual "engine" of sorts.
It will probably be somewhat overkill for your purposes, but may well enable your application to be distributed for little to no additional programming (if applicable), you just plug in a different JMS implementation like ActiveMQ and you're done.

First of all, if you are planning to have a lot of receivers, I would not use the ONE-THREAD-AND-QUEUE-PER-RECEIVER approach. You could end up with a lot of threads not doing anything most of the time and I could hurt you performance wide. An alternative is using a thread pool of worker threads, just picking tasks from a shared queue, each task with its own receiver ID, and perhaps, a shared dictionary with socket connections to each receiver for the working threads to use.
Having said so, if you still want to pursue your approach what you could do is:
1) Create a new class to handle your new thread execution:
public class Worker implements Runnable {
private Queue<String> myQueue = new Queue<String>();
public void run()
{
while (true) {
string messageToProcess = null;
synchronized (myQueue) {
if (!myQueue.empty()) {
// get your data from queue
messageToProcess = myQueue.pop();
}
}
if (messageToProcess != null) {
// do your stuff
}
Thread.sleep(500); // to avoid spinning
}
}
public void queueMessage(String message)
{
synchronized(myQueue) {
myQueue.add(message);
}
}
}
2) On your main thread, create the messages and use a dictionary (hash table) to see if the receiver's threads is already created. If is is, the just queue the new message. If not, create a new thread, put it in the hashtable and queue the new message:
while (true) {
String msg = getNewCreatedMessage(); // you get your messages from here
int id = getNewCreatedMessageId(); // you get your rec's id from here
Worker w = myHash(id);
if (w == null) { // create new Worker thread
w = new Worker();
new Thread(w).start();
}
w.queueMessage(msg);
}
Good luck.
Edit: you can improve this solution by using BlockingQueue Brian mentioned with this approach.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.