Concurrency in distributed Task Queue (Producer/Consumer)

Concurrency in distributed Task Queue (Producer/Consumer) - java

My application(Java) randomly produces some tasks and asynchronous consumed by distributed background threads.
I don't have distributed lock solution such as ZooKeeper at present.
I don't have any 3rd party message queues.
I use a database as the task queue and the consumed results are also saved in the database, which shared access by all consumers/producers.
I have some code like this:
Consumer:
while(true) {
// block the thread and wait from producer's notify
// my producers would produce MANY work items but only notify each consumer ONCE.
waitProducer();
// consume the queue
while(database.queueNotEmpty()) {
// consume each work item and remove from database queue
consumeAll();
}
}
Producer:
for(...) {
database.enqueue(work[i]);
}
// notify all consumers
notifyAllConsumer();
Apparently the code above has concurrent bugs. I have 3 questions:
1.How to avoid distributed consumers consume the same task?
(about the line: "consumeAll()")
or reduce the duplicated compute. consume one task multi times won't be a bug but less efficient in my case.
2.How to avoid the queue is NOT empty but no consumer active? the sequence would be:
one consumer & one producer sample:
Consumer: while(database.queueNotEmpty()) // queue is empty, break
the while loop
Producer: database.enqueue(work[i]); // produce a task
Producer: notifyAllConsumer(); // notify the consumer, but
it is already active
Consumer: waitProducer(); // hang the thread but still has work to do
3.Any best practice for this problem? especially in pure java.
Is a third party message queue or something like zookeeper a must?
Less lock or no lock is preferred; efficient is prefer to correctness in my case.
Thanks!

I would suggest you to use LinkedBlockingQueue in this case.
LinkedBlockingQueue tutorial
You can use take() / put() methods and if you want to wait with timelimit you can use offer(), poll() and peek().
I also used this in similar kind of problem.

Related

BlockingQueue with slow producer and fast consumers

I'm writing a Java command line application that scrapes a website and downloads video files. The video files range in size from a few megs to 20 GB or more. This means downloading a file can take as little as a few seconds to as much as a few hours. I've decided to implement a produce/consumer pattern to handle the scraping and downloading of files. A producer thread scrapes the site and retrieves the links to the video files and puts those links into an object and puts that object into an unbounded blocking queue. There are N consumer threads that handle the download. They retrieve the objects containing the URLs from the blocking queue and each thread downloads the file. The object that the producer puts on the queue contains the URL along with some other information that the consumer will need to save the file to the correct location in local storage. Before a file is downloaded, the consumer thread first checks if the file already exists in local storage. If the file exists, the download is skipped and the next object is pulled from the queue. If a consumer experiences a problem while downloading a file (connection reset, etc.), the consumer puts the object containing the URL into a separate queue for failed requests and sleeps for 15 minutes. This allows the application to deal with temporary network interruptions. While the producer is active, it checks the failed URLs queue and removes those URLs from that queue and puts them back into the main queue.
After implementing this initial design, I quickly realized that I had a problem. Because I'm using a blocking queue and the worker threads are polling without a timeout, once the producer was finished, it couldn't just complete its execution because it needed to hang around to put failed URLs back into the queue. My first attempt at a solution was to remove the second "failure" queue and have workers put failed URLs back into the main queue. This meant that the application now had N consumers and N + 1 producers. This approach would allow the main producer thread to just exit when it was finished because it didn't have to worry about putting failed requests back into the queue. Once that problem was solved, there was still another problem. The problem of notifying the worker threads that they could exit once the queue was empty. A blocking queue has no mechanism for the producer to signal that it won't be putting more data to the queue. I thought about having the consumers poll the queue with a timeout and have the primary producer set some sort of flag when it exits. When a consumer times out, it checks the flag. If the flag is set, the consumer exits, if not set, it polls the queue again. While this approach will work, I don't like the design. I don't like the idea of having threads sitting around unnecessarily and I hate even more the use of a magic flag. The only interaction between producer and consumers should be via the queue. The consumers have no knowledge of the producer and checking a magic flag breaks that principle.
I ditched the blocking queue and decided to use a regular non-blocking queue. To prevent the worker threads from exiting as soon as they started, I used a CyclicBarrier. When a worker thread starts, it waits at the barrier before polling the queue. Meanwhile, the producer thread was coded to lower the barrier once the queue contained 10 x N URLs. Once the barrier was lowered, the worker threads would begin processing the Queue. This approach quickly failed because in some cases the consumers would consume the queue faster than the producer could replenish it. This happens in cases where a large number of files are already stored on disk so the consumers don't need to download anything. Once the queue was empty, the consumers exited, even though the producer was still scraping the site looking for URLs.
This tells me that I need to use a blocking queue. I'm continuing to try to find a clean, elegant solution that doesn't depend on timeouts and magic flags. I would love to hear your approach to solving this problem given the requirements.
UPDATE:
I finally settled on a solution based on comments made by user Martin James. Since these were comments and not an answer, there isn't an answer for me to accept. If Martin summarizes his comments into an answer, I'll accept it. Now here's the solution.
When the producer thread completes, it places N objects into the queue that contain null as the value for the URL. I updated the consumer thread to check for a null URL when they pull an object from the queue. If the URL is null, the consumer exits. This approach solves the notification to consumers that the producer is complete. However, it doesn't solve the problem of consumers putting URLs into the queue after the producer has exited. To solve that problem, I switched to a priority blocking queue. I made the object that gets put into the queue a Comparable and the compareTo logic was coded such that objects with null values for the URL will always be last in the queue. So when the producer exits and it places the terminating objects in the queue, if/when a consumer places an object back into the queue, those objects will always be ahead of the terminating objects.
Thanks all for the comments and feedback. Very much appreciated.

My approach would be to use framework with back-pressure mechanism support, for example vert.x reactive streams.
Good examples of systems handling back-pressure built on vert.x can be found in the book vert.x in action

ExecutorService in Java, for example, is a producer-consumer model with a series of worker threads trying to fetch tasks from a work queue. I might close the thread pool by ExecutorService#shutdownNow, this method will set the thread pool state to STOP and interrupt each worker. Take a look at shutdownNow method and worker's run method(I removed the irrelevant code):
public List<Runnable> shutdownNow() {
advanceRunState(STOP);
interruptWorkers();
}
final void runWorker(Worker w) {
try {
while (task != null || (task = getTask()) != null) {
// ...
}
}
}
private Runnable getTask() {
boolean timedOut = false; // Did the last poll() time out?
for (;;) {
// ...
// Check if queue empty only if necessary.
if (rs >= SHUTDOWN && (rs >= STOP || workQueue.isEmpty())) {
decrementWorkerCount();
return null;
}
try {
Runnable r = timed ?
workQueue.poll(keepAliveTime, TimeUnit.NANOSECONDS) :
workQueue.take();
if (r != null)
return r;
timedOut = true;
}
}
}
I think this is an example of using Flag & interrupt to stop consumers. I don't think it's inelegant.

How to distrubute tasks to Java threads?

I have a database table called jobs and some producer service inserting data to this table. I need to create a consumer service to process this data.
I got a server with 8 core 16 threads and I create a thread pool with 16 threads.
ExecutorService executorService = Executors.newFixedThreadPool(16);
I will fetch 16 records from database and distrubute this data to consumer threads. After all threads completes their job I will fetch another 16 records.(I really don't know my solution is efficient or not)
How can I distrubute these tasks to consumer threads? Do I need to use BlockingQueue?

Executor service has a queue to buffer your task when the thread is not available.
You need to write another thread which will submit task periodically to the executor service and also check the completion strategy .i.e if executorservice queue is full required to handle.

The javadoc:s for ExecutorService might come in handy here.
Create the work as implementing Callable, put them into a collection and use executorService.invokeAll(<Collection of Callable), check for the Futures to complete.
Or just use executorService.submit(<task>)

There is no need to "batch" the records. Just submit them to the executor service.
If you are concerned that you might overwhelm the JVM's heap by filling the executor service's queue, then create your executor service using (for example) an ArrayBlockingQueue as the work queue. That will cause the executor to reject requests if the work queue gets too long. Various other strategies are possible.
If you are going to do fancy things with your ExecutorService, I recommend that you read the javadocs for ThreadPoolExecutor. The API is rich and complex, and warrants thorough reading before you choose a specific implementation approach.

While that approach will work, it may still leave some unused computation power, assuming that 16 is the optimal number of threads.
I'd rather use a pull-based approach, in which threads "pull" entries to process:
Option 1: Retrieve all records and use a parallel stream:
List<Record> allValues = //fetch
allValues.parallelStream().forEach(...do your processing...);
//You can even have a better version that reads data from the result set as needed:
Stream.generate(() -> {
resultSet.next();
return rs.getObject(1); //Read/create the value from the record
});
Option 2: Use a queue of some sort based on all the data retrieved from the DB and then create implementations of callable that processes queue (they loop, each thread remaining busy until the queue is exhausted). You can then use the executor service, launch those tasks:
Queue<Object> records; //Create the queue of records
ExecutorService es; // Instantiate the executor service with desired capacity
//Execute the runnable that processes the queue. Only ending when there's nothing on the queue.
for (int i = 0; i < 16; i++) {
es.execute(() -> {
while(!records.isEmpty()) {
//You need to handle this as this
//check and a poll() call may need to be synced.
Object next = records.poll();
//process
}
});
}

I would recommend to use dedicated ArrayBlockingQueue of size twice as much as the number of available processors (2x16=32, in your case).
A separate thread reads records from database and puts them in the queue. If the queue is full, that thread would wait when a space for the next record become available. If records are processed faster than the reading thread is able to retrieve them from the database, a number of reading threads can be used, all running the same read-put loop.
Consumer threads simply take next records from the queue and process them, in a loop.
Append:
alternative approach is to wrap each record with an object of type Runnable (or Callable) and submit it to the executor service. One, smaller drawback is that additional wrapper object has to be created. The greater drawback is that the input executor queue may become overloaded. Then, depending of what kind of queue is used, that queue either throw RejectedExecutionException, or consume all available core memory. The ArrayBlockingQueue in case of overflow simply suspends the producer threads.

Looking for solution based on ThreadPoolExecutor to ensure sequential execution of tasks

I have a thread pool of m threads. Let's say m were 10 and fix. Then there are n queues with the possibility of n becoming large (like 100'000 or more). Every queue holds tasks to be executed by those m threads. Now, very important, every queue must be worked off sequentially task by task. This is a requirement to make sure that tasks are executed in the order they were added to the queue. Otherwise the data could become inconsistent (same as, say, with JMS queues).
So the question is now how to make sure that the tasks in those n queues are processed by the available m threads in a way that no task added to the same queue can be executed "at the same time" by different threads.
I tried to solve this problem myself and figured out that it is quite demanding. Java ThreadPoolExecutor is nice, but you would have to add quite a bit of functionality that is not easy to develop. So the question is whether anyone knows of some framework or system for Java that already solves this problem?
Update
Thanks to Adrian and Tanmay for their suggestions. The number of queues may be very large (like 100'000 or more). So one thread per queue is unhappily not possible although it would be simple and easy. I will look into the fork join framework. Looks like an interesting path to pursue.
My current first iteration solution is to have a global queue to which all tasks are added (using a JDK8 TransferQueue, which has very little locking overhead). Tasks are wrapped into a queue stub with the lock of the queue and its size. The queue itself does not exist physically, only its stub.
An idle thread first needs to obtain a token before it can access the global queue (the token would be a single element in a blocking queue, e.g. JDK8 TransferQueue). Then it does a blocking take on the global queue. When a task was obtained, it checks whether the queue lock of the task's queue stub is down. Actually, I think just using an AtomicBoolean would be sufficient and create less lock contention than a lock or synchronized block.
When the queue lock is obtained, the token is returned to the global queue and the task is executed. If it is not obtained, the task is added to a 2nd level queue and another blocking take from the global queue is done. Threads need to check whether the 2nd level queue is empty and take a task from it to be executed as well.
This solution seems to work. However, the token every thread needs to acquire before being allowed to access the global queue and the 2nd level queue looks like a bottleneck. I believe it will create high lock contention. So, I'm not so happy with this. Maybe I start with this solution and elaborate on it.
Update 2
All right, here now the "best" solution I have come up with so far. The following queues are defined:
Ready Queue (RQ): Contains all tasks that can be executed immediately by any thread in the thread pool
Entry Queue (EQ): Contains all tasks the user wants to be executed as well as internal admin tasks. The EQ is a priority queue. Admin tasks have highest priority.
Channels Queues (CQ): For every channel there is an internal channel queue that is used to preserve the ordering of the tasks, e.g. make sure task are executed sequentially in the order they were added to EQ
Scheduler: Dedicated thread that takes tasks from EQ. If the task is a user task it is added to the CQ of the channel the task was added to. If the head of the CQ equals the just inserted user task it is also added to the EQ (but remains in the CQ) so that it is executes as soon as the next thread of the thread pool becomes available.
If a user task has finished execution an internal task TaskFinished is added to RQ. When executed by the scheduler, the head is taken from the associated CQ. If the CQ is not empty after the take, the next task is polled (but not taken) from the CQ and added to the RQ. The TaskFinished tasks have higher priority than user tasks.
This approach contains in my opinion no logical errors. Note that EQ and RQ need to be synchronized. I prefer using TransferQueue from JDK8 which is very fast and where checking for it to be empty or not, polling the head item is also very fast. The CQs need not be synchronized as they are always accessed by the Scheduler only.
So far I'm quite happy with this solution. What makes me think is whether the Scheduler could turn into a bottleneck. If there are much more tasks in the EQ than it can handle the EQ might grow building up some backlog. Any opinions about that would be appreciated :-)

You can use Fork Join Framework if you are working in Java 7 or Java 8.
You can create a RecursiveTask using popped first element from each queue.
Remember to provide a reference to the queues to the corresponding RecursiveTasks.
Invoke all of the at once. (In a loop or stream).
Now at the end of the compute method (after processing of a task is completed), create another RecursiveTask by popping another element from the corresponding queue and call invoke on it.
Notes:
Each task will be responsible for extracting new element from the queue, so all tasks from the queue would be executed sequentially.
There should be a new RecursiveTask created and invoked separately for each element in the queues. This ensures that some queues do not hog the threads and starvation is avoided.
Using an ExecutorService is also a viable option, but IMO ForkJoin's API if friendlier for your use case
Hope this helps.

One simple solution would be to create a task whenever an element is added to an empty queue. This task would be responsible for only that queue and would end when the queue has been worked off. Ensure that the Queue implementations are thread-safe and the task stops after removing the last element.
EDIT: These tasks should be added to a ThreadPoolExecutor with an internal queue, for example one created by ExecutorService.newFixedThreadPool, which will work off the tasks in parallel with a limited number of threads.
Alternatively, just divide the queues among a fixed number of threads:
public class QueueWorker implements Runnable {
// should be unique and < NUM_THREADS:
int threadId;
QueueWorker(int threadId) {
this.threadId = threadId;
}
#Override
public void run() {
int currentQueueIndex = threadId;
while (true) {
Queue currentQueue = queues.get(currentQueue);
// execute tasks until empty
currentQueueIndex += NUM_THREADS;
if (currentQueueIndex > queues.size()) {
currentQueueIndex = threadId;
}
}
}
}

How do I give priority to Consumers when using a LinkedBlockingQueue?

I am using a LinkedBlockingQueue together with the producer/consumer pattern to buffer tasks. To add tasks to the queue I use the method for my producers: Queue.put(Object); To take a task form my queue I use for my consumers: Queue.take(Object);
I found in the Java api that both these methods will block until they the queue becomes available. My problem is: I know for a fact that there are more producers of tasks in my system then consumers. And all my tasks need to be processed. So I need my consumers, when blocked, to have priority over the producers to get the queue.
Is their a way to do this without changing the methods of LinkedBlockingQueue to much?

LinkedBlockingQueue uses two ReenterantLocks lock.
private final ReentrantLock putLock = new ReentrantLock();
private final ReentrantLock takeLock = new ReentrantLock();
Since both the locks are seperate and put and take aquires seperate locks for carrying out their operating blocking one operation would not impact other operation.
Cheers !!

There is no need to prioritize consumers over producers, because they block under entirely different conditions: if the producer is blocked because the queue is full, then the consumers won't be blocked as a result of the queue being empty.
For example, producer1 has a blocked put call because the queue is full. Consumer1 then executes take, which proceeds as normal because the queue is not empty (unless your queue has a capacity of 0, which would be silly) - the consumer doesn't know or care that a producer's put call is blocked, all it cares about is that the queue is not empty.

The producers being blocked doesn't block consumers due to multiple independent locks.
take( states:
Retrieves and removes the head of this queue, waiting if necessary until an element becomes available.
put( states:
Inserts the specified element at the tail of this queue, waiting if necessary for space to become available
If there is no space, then put will block but take won't get blocked as it's by design waiting only if the queue is empty, obviously not the case here.
Original comment:
As far as I know, this queue, by design, won't block consumers even if producers are blocked due to the queue being full.

Java BlockingQueue with batching?

I am interested in a data structure identical to the Java BlockingQueue, with the exception that it must be able to batch objects in the queue. In other words, I would like the producer to be able to put objects into the queue, but have the consumer block on take() untill the queue reaches a certain size (the batch size).
Then, once the queue has reached the batch size, the producer must block on put() untill the consumer has consumed all of the elements in the queue (in which case the producer will start producing again and the consumer block untill the batch is reached again).
Does a similar data structure exist? Or should I write it (which I don't mind), I just don't want to waste my time if there is something out there.
UPDATE
Maybe to clarify things a bit:
The situation will always be as follows. There can be multiple producers adding items to the queue, but there will never be more than one consumer taking items from the queue.
Now, the problem is that there are multiple of these setups in parallel and serial. In other words, producers produce items for multiple queues, while consumers in their own right can also be producers. This can be more easily thought of as a directed graph of producers, consumer-producers, and finally consumers.
The reason that producers should block until the queues are empty (#Peter Lawrey) is because each of these will be running in a thread. If you leave them to simply produce as space becomes available, you will end up with a situation where you have too many threads trying to process too many things at once.
Maybe coupling this with an execution service could solve the problem?

I would suggest you use BlockingQueue.drainTo(Collection, int). You can use it with take() to ensure you get a minimum number of elements.
The advantage of using this approach is that your batch size grows dynamically with the workload and the producer doesn't have to block when the consumer is busy. i.e. it self optimises for latency and throughput.
To implement exactly as asked (which I think is a bad idea) you can use a SynchronousQueue with a busy consuming thread.
i.e. the consuming thread does a
list.clear();
while(list.size() < required) list.add(queue.take());
// process list.
The producer will block when ever the consumer is busy.

Here is a quick ( = simple but not fully tested) implementation that i think may be suitable for your requests - you should be able to extend it to support the full queue interface if you need to.
to increase performance you can switch to ReentrantLock instead of using "synchronized" keyword..
public class BatchBlockingQueue<T> {
private ArrayList<T> queue;
private Semaphore readerLock;
private Semaphore writerLock;
private int batchSize;
public BatchBlockingQueue(int batchSize) {
this.queue = new ArrayList<>(batchSize);
this.readerLock = new Semaphore(0);
this.writerLock = new Semaphore(batchSize);
this.batchSize = batchSize;
}
public synchronized void put(T e) throws InterruptedException {
writerLock.acquire();
queue.add(e);
if (queue.size() == batchSize) {
readerLock.release(batchSize);
}
}
public synchronized T poll() throws InterruptedException {
readerLock.acquire();
T ret = queue.remove(0);
if (queue.isEmpty()) {
writerLock.release(batchSize);
}
return ret;
}
}
Hope you find it useful.

I recently developed this utility that batch BlockingQueue elements using a flushing timeout if queue elements doesn't reach the batch size. It also supports fanOut pattern using multiple instances to elaborate the same set of data:
// Instantiate the registry
FQueueRegistry registry = new FQueueRegistry();
// Build FQueue consumer
registry.buildFQueue(String.class)
.batch()
.withChunkSize(5)
.withFlushTimeout(1)
.withFlushTimeUnit(TimeUnit.SECONDS)
.done()
.consume(() -> (broadcaster, elms) -> System.out.println("elms batched are: "+elms.size()));
// Push data into queue
for(int i = 0; i < 10; i++){
registry.sendBroadcast("Sample"+i);
}
More info here!
https://github.com/fulmicotone/io.fulmicotone.fqueue

Not that I am aware of. If I understand correctly you want either the producer to work (while the consumer is blocked) until it fills the queue or the consumer to work (while the producer blocks) until it clears up the queue. If that's the case may I suggest that you don't need a data structure but a mechanism to block the one party while the other one is working in a mutex fasion. You can lock on an object for that and internally have the logic of whether full or empty to release the lock and pass it to the other party. So in short, you should write it yourself :)

This sounds like how the RingBuffer works in the LMAX Disruptor pattern. See http://code.google.com/p/disruptor/ for more.
A very rough explanation is your main data structure is the RingBuffer. Producers put data in to the ring buffer in sequence and consumers can pull off as much data as the producer has put in to the buffer (so essentially batching). If the buffer is full, the producer blocks until the consumer has finished and freed up slots in the buffer.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.