BlockingQueue with slow producer and fast consumers

BlockingQueue with slow producer and fast consumers - java

I'm writing a Java command line application that scrapes a website and downloads video files. The video files range in size from a few megs to 20 GB or more. This means downloading a file can take as little as a few seconds to as much as a few hours. I've decided to implement a produce/consumer pattern to handle the scraping and downloading of files. A producer thread scrapes the site and retrieves the links to the video files and puts those links into an object and puts that object into an unbounded blocking queue. There are N consumer threads that handle the download. They retrieve the objects containing the URLs from the blocking queue and each thread downloads the file. The object that the producer puts on the queue contains the URL along with some other information that the consumer will need to save the file to the correct location in local storage. Before a file is downloaded, the consumer thread first checks if the file already exists in local storage. If the file exists, the download is skipped and the next object is pulled from the queue. If a consumer experiences a problem while downloading a file (connection reset, etc.), the consumer puts the object containing the URL into a separate queue for failed requests and sleeps for 15 minutes. This allows the application to deal with temporary network interruptions. While the producer is active, it checks the failed URLs queue and removes those URLs from that queue and puts them back into the main queue.
After implementing this initial design, I quickly realized that I had a problem. Because I'm using a blocking queue and the worker threads are polling without a timeout, once the producer was finished, it couldn't just complete its execution because it needed to hang around to put failed URLs back into the queue. My first attempt at a solution was to remove the second "failure" queue and have workers put failed URLs back into the main queue. This meant that the application now had N consumers and N + 1 producers. This approach would allow the main producer thread to just exit when it was finished because it didn't have to worry about putting failed requests back into the queue. Once that problem was solved, there was still another problem. The problem of notifying the worker threads that they could exit once the queue was empty. A blocking queue has no mechanism for the producer to signal that it won't be putting more data to the queue. I thought about having the consumers poll the queue with a timeout and have the primary producer set some sort of flag when it exits. When a consumer times out, it checks the flag. If the flag is set, the consumer exits, if not set, it polls the queue again. While this approach will work, I don't like the design. I don't like the idea of having threads sitting around unnecessarily and I hate even more the use of a magic flag. The only interaction between producer and consumers should be via the queue. The consumers have no knowledge of the producer and checking a magic flag breaks that principle.
I ditched the blocking queue and decided to use a regular non-blocking queue. To prevent the worker threads from exiting as soon as they started, I used a CyclicBarrier. When a worker thread starts, it waits at the barrier before polling the queue. Meanwhile, the producer thread was coded to lower the barrier once the queue contained 10 x N URLs. Once the barrier was lowered, the worker threads would begin processing the Queue. This approach quickly failed because in some cases the consumers would consume the queue faster than the producer could replenish it. This happens in cases where a large number of files are already stored on disk so the consumers don't need to download anything. Once the queue was empty, the consumers exited, even though the producer was still scraping the site looking for URLs.
This tells me that I need to use a blocking queue. I'm continuing to try to find a clean, elegant solution that doesn't depend on timeouts and magic flags. I would love to hear your approach to solving this problem given the requirements.
UPDATE:
I finally settled on a solution based on comments made by user Martin James. Since these were comments and not an answer, there isn't an answer for me to accept. If Martin summarizes his comments into an answer, I'll accept it. Now here's the solution.
When the producer thread completes, it places N objects into the queue that contain null as the value for the URL. I updated the consumer thread to check for a null URL when they pull an object from the queue. If the URL is null, the consumer exits. This approach solves the notification to consumers that the producer is complete. However, it doesn't solve the problem of consumers putting URLs into the queue after the producer has exited. To solve that problem, I switched to a priority blocking queue. I made the object that gets put into the queue a Comparable and the compareTo logic was coded such that objects with null values for the URL will always be last in the queue. So when the producer exits and it places the terminating objects in the queue, if/when a consumer places an object back into the queue, those objects will always be ahead of the terminating objects.
Thanks all for the comments and feedback. Very much appreciated.

My approach would be to use framework with back-pressure mechanism support, for example vert.x reactive streams.
Good examples of systems handling back-pressure built on vert.x can be found in the book vert.x in action

ExecutorService in Java, for example, is a producer-consumer model with a series of worker threads trying to fetch tasks from a work queue. I might close the thread pool by ExecutorService#shutdownNow, this method will set the thread pool state to STOP and interrupt each worker. Take a look at shutdownNow method and worker's run method(I removed the irrelevant code):
public List<Runnable> shutdownNow() {
advanceRunState(STOP);
interruptWorkers();
}
final void runWorker(Worker w) {
try {
while (task != null || (task = getTask()) != null) {
// ...
}
}
}
private Runnable getTask() {
boolean timedOut = false; // Did the last poll() time out?
for (;;) {
// ...
// Check if queue empty only if necessary.
if (rs >= SHUTDOWN && (rs >= STOP || workQueue.isEmpty())) {
decrementWorkerCount();
return null;
}
try {
Runnable r = timed ?
workQueue.poll(keepAliveTime, TimeUnit.NANOSECONDS) :
workQueue.take();
if (r != null)
return r;
timedOut = true;
}
}
}
I think this is an example of using Flag & interrupt to stop consumers. I don't think it's inelegant.

Related

Java multithreading: a good way to notify the consumer threads that all the producer threads have finished?

I am working on a fairly simple producer/consumer-scenario: I have some threads that deliver data to a monitor, and other threads that await data in the monitor, remove it and deliver the data to another monitor. At a certain point, the producers wille all have delivered their last data to the monitor. After the consumers have consumed the last data in the monitor, they need to be told not await more data from the monitor. To make this run as it should, the consumer threads need to get notified when the last producer thread has produced it's last bit of data, and there is no more data due. I am sure there are multiple ways to do this. As of now, the monitor counts the number of active produer threads, and when a producer thread finishes, it tells the monitor so. I am very curios though what the more elegant approach to this would be.

A simple solution is to let each producer send a poison pill where consumer would keep a count of the poison pills received so far and compare it with the number of producers.
class Consumer{
final int numOfProducers;
int poisonPillsReceived;
void run(){
while(true){
Object obj = queue.poll();
if(isPoisonPill(obj)){
poisonPillsReceived++;
}
if(numOfProducers == poisonPillsReceived){
break;
}else{
....
}
}
}
}

Looking for solution based on ThreadPoolExecutor to ensure sequential execution of tasks

I have a thread pool of m threads. Let's say m were 10 and fix. Then there are n queues with the possibility of n becoming large (like 100'000 or more). Every queue holds tasks to be executed by those m threads. Now, very important, every queue must be worked off sequentially task by task. This is a requirement to make sure that tasks are executed in the order they were added to the queue. Otherwise the data could become inconsistent (same as, say, with JMS queues).
So the question is now how to make sure that the tasks in those n queues are processed by the available m threads in a way that no task added to the same queue can be executed "at the same time" by different threads.
I tried to solve this problem myself and figured out that it is quite demanding. Java ThreadPoolExecutor is nice, but you would have to add quite a bit of functionality that is not easy to develop. So the question is whether anyone knows of some framework or system for Java that already solves this problem?
Update
Thanks to Adrian and Tanmay for their suggestions. The number of queues may be very large (like 100'000 or more). So one thread per queue is unhappily not possible although it would be simple and easy. I will look into the fork join framework. Looks like an interesting path to pursue.
My current first iteration solution is to have a global queue to which all tasks are added (using a JDK8 TransferQueue, which has very little locking overhead). Tasks are wrapped into a queue stub with the lock of the queue and its size. The queue itself does not exist physically, only its stub.
An idle thread first needs to obtain a token before it can access the global queue (the token would be a single element in a blocking queue, e.g. JDK8 TransferQueue). Then it does a blocking take on the global queue. When a task was obtained, it checks whether the queue lock of the task's queue stub is down. Actually, I think just using an AtomicBoolean would be sufficient and create less lock contention than a lock or synchronized block.
When the queue lock is obtained, the token is returned to the global queue and the task is executed. If it is not obtained, the task is added to a 2nd level queue and another blocking take from the global queue is done. Threads need to check whether the 2nd level queue is empty and take a task from it to be executed as well.
This solution seems to work. However, the token every thread needs to acquire before being allowed to access the global queue and the 2nd level queue looks like a bottleneck. I believe it will create high lock contention. So, I'm not so happy with this. Maybe I start with this solution and elaborate on it.
Update 2
All right, here now the "best" solution I have come up with so far. The following queues are defined:
Ready Queue (RQ): Contains all tasks that can be executed immediately by any thread in the thread pool
Entry Queue (EQ): Contains all tasks the user wants to be executed as well as internal admin tasks. The EQ is a priority queue. Admin tasks have highest priority.
Channels Queues (CQ): For every channel there is an internal channel queue that is used to preserve the ordering of the tasks, e.g. make sure task are executed sequentially in the order they were added to EQ
Scheduler: Dedicated thread that takes tasks from EQ. If the task is a user task it is added to the CQ of the channel the task was added to. If the head of the CQ equals the just inserted user task it is also added to the EQ (but remains in the CQ) so that it is executes as soon as the next thread of the thread pool becomes available.
If a user task has finished execution an internal task TaskFinished is added to RQ. When executed by the scheduler, the head is taken from the associated CQ. If the CQ is not empty after the take, the next task is polled (but not taken) from the CQ and added to the RQ. The TaskFinished tasks have higher priority than user tasks.
This approach contains in my opinion no logical errors. Note that EQ and RQ need to be synchronized. I prefer using TransferQueue from JDK8 which is very fast and where checking for it to be empty or not, polling the head item is also very fast. The CQs need not be synchronized as they are always accessed by the Scheduler only.
So far I'm quite happy with this solution. What makes me think is whether the Scheduler could turn into a bottleneck. If there are much more tasks in the EQ than it can handle the EQ might grow building up some backlog. Any opinions about that would be appreciated :-)

You can use Fork Join Framework if you are working in Java 7 or Java 8.
You can create a RecursiveTask using popped first element from each queue.
Remember to provide a reference to the queues to the corresponding RecursiveTasks.
Invoke all of the at once. (In a loop or stream).
Now at the end of the compute method (after processing of a task is completed), create another RecursiveTask by popping another element from the corresponding queue and call invoke on it.
Notes:
Each task will be responsible for extracting new element from the queue, so all tasks from the queue would be executed sequentially.
There should be a new RecursiveTask created and invoked separately for each element in the queues. This ensures that some queues do not hog the threads and starvation is avoided.
Using an ExecutorService is also a viable option, but IMO ForkJoin's API if friendlier for your use case
Hope this helps.

One simple solution would be to create a task whenever an element is added to an empty queue. This task would be responsible for only that queue and would end when the queue has been worked off. Ensure that the Queue implementations are thread-safe and the task stops after removing the last element.
EDIT: These tasks should be added to a ThreadPoolExecutor with an internal queue, for example one created by ExecutorService.newFixedThreadPool, which will work off the tasks in parallel with a limited number of threads.
Alternatively, just divide the queues among a fixed number of threads:
public class QueueWorker implements Runnable {
// should be unique and < NUM_THREADS:
int threadId;
QueueWorker(int threadId) {
this.threadId = threadId;
}
#Override
public void run() {
int currentQueueIndex = threadId;
while (true) {
Queue currentQueue = queues.get(currentQueue);
// execute tasks until empty
currentQueueIndex += NUM_THREADS;
if (currentQueueIndex > queues.size()) {
currentQueueIndex = threadId;
}
}
}
}

Concurrency in distributed Task Queue (Producer/Consumer)

My application(Java) randomly produces some tasks and asynchronous consumed by distributed background threads.
I don't have distributed lock solution such as ZooKeeper at present.
I don't have any 3rd party message queues.
I use a database as the task queue and the consumed results are also saved in the database, which shared access by all consumers/producers.
I have some code like this:
Consumer:
while(true) {
// block the thread and wait from producer's notify
// my producers would produce MANY work items but only notify each consumer ONCE.
waitProducer();
// consume the queue
while(database.queueNotEmpty()) {
// consume each work item and remove from database queue
consumeAll();
}
}
Producer:
for(...) {
database.enqueue(work[i]);
}
// notify all consumers
notifyAllConsumer();
Apparently the code above has concurrent bugs. I have 3 questions:
1.How to avoid distributed consumers consume the same task?
(about the line: "consumeAll()")
or reduce the duplicated compute. consume one task multi times won't be a bug but less efficient in my case.
2.How to avoid the queue is NOT empty but no consumer active? the sequence would be:
one consumer & one producer sample:
Consumer: while(database.queueNotEmpty()) // queue is empty, break
the while loop
Producer: database.enqueue(work[i]); // produce a task
Producer: notifyAllConsumer(); // notify the consumer, but
it is already active
Consumer: waitProducer(); // hang the thread but still has work to do
3.Any best practice for this problem? especially in pure java.
Is a third party message queue or something like zookeeper a must?
Less lock or no lock is preferred; efficient is prefer to correctness in my case.
Thanks!

I would suggest you to use LinkedBlockingQueue in this case.
LinkedBlockingQueue tutorial
You can use take() / put() methods and if you want to wait with timelimit you can use offer(), poll() and peek().
I also used this in similar kind of problem.

Consuming from many queues

I have a large number of state machines. Occasionally, a state machine will need to be moved from one state to another, which may be cheap or expensive and may involve DB reads and writes and so on.
These state changes occur because of incoming commands from clients, and can occur at any time.
I want to parallelise the workload. I want a queue saying 'move this machine from this state to this state'. Obviously the commands for any one machine need to be performed in sequence, but I can be moving many machines forward in parallel if I have many threads.
I could have a thread per state machine, but the number of state machines is data-dependent and may be many hundreds or thousands; I don't want a dedicated thread per state machine, I want a pool of some sort.
How can I have a pool of workers but ensure that the commands for each state machine are processed strictly sequentially?
UPDATE: so imagine the Machine instance has a list of outstanding commands. When an executor in the thread pool has finished consuming a command, it puts the Machine back into the thread-pool's task queue if it has more outstanding commands. So the question is, how to atomically put the Machine into the thread pool when you append the first command? And ensure this is all thread safe?

I suggest you this scenario:
Create thread pool, probably some of fix size with Executors.newFixedThreadPool
Create some structure (probably it would be a HashMap) which holds one Semaphore for each state machine. That semaphores will have a value of 1 and would be fair semaphores to keep sequence
In Runnable which will do the job on the begging just add semaphore.aquire() for semaphore of its state machine and semaphore.release() at the end of run method.
With size of thread pool you will control level of parallelism.

I suggest another approach. Instead of using a threadpool to move states in a state machine, use a threadpool for everything, including doing the work. After doin some work resulting in a state-change the state-change event should be added to the queue. After the state-change is processed, another do-work event should be added to the queue.
Assuming that the state transition is work-driven, and vice-versa, asequential processing is not possible.
The idea with storing semaphores in a special map is very dangerous. The map will have to be synchronized (adding/removing objs is thread-unsafe) and there is relatively large overhead of doing the searches (possibly synchronizing on the map) and then using the semaphore.
Besides - if you want to use a multithreaded architecture in your application, I think that you should go all the way. Mixing different architectures may proove troublesome later on.

Have a thread ID per machine. Spawn the desired number of threads. Have all the threads greedily process messages from the global queue. Each thread locks the current message's server to be used exclusively by itself (until it's done processing the current message and all messages on its queue), and the other threads puts messages for that server on its internal queue.
EDIT: Handling message pseudo-code:
void handle(message)
targetMachine = message.targetMachine
if (targetMachine.thread != null)
targetMachine.thread.addToQueue(message);
else
targetMachine.thread = this;
process(message);
processAllQueueMessages();
targetMachine.thread = null;
Handling message Java code: (I may be overcomplicating things slightly, but this should be thread-safe)
/* class ThreadClass */
void handle(Message message)
{
// get targetMachine from message
targetMachine.mutexInc.aquire(); // blocking
targetMachine.messages++;
boolean acquired = targetMachine.mutex.aquire(); // non-blocking
if (acquired)
targetMachine.threadID = this.ID;
targetMachine.mutexInc.release();
if (!acquired)
// can put this before release, it may speed things up
threads[targetMachine.threadID].addToQueue(message);
else
{
process(message);
targetMachine.messages--;
while (true)
{
while (!queue.empty())
{
process(queue.pop());
targetMachine.messages--;
}
targetMachine.mutexInc.acquire(); // blocking
if (targetMachine.messages > 0)
{
targetMachine.mutexInc.release();
Thread.sleep(1);
}
else
break;
}
targetMachine.mutex.release();
targetMachine.mutexInc.release();
}
}

Java BlockingQueue with batching?

I am interested in a data structure identical to the Java BlockingQueue, with the exception that it must be able to batch objects in the queue. In other words, I would like the producer to be able to put objects into the queue, but have the consumer block on take() untill the queue reaches a certain size (the batch size).
Then, once the queue has reached the batch size, the producer must block on put() untill the consumer has consumed all of the elements in the queue (in which case the producer will start producing again and the consumer block untill the batch is reached again).
Does a similar data structure exist? Or should I write it (which I don't mind), I just don't want to waste my time if there is something out there.
UPDATE
Maybe to clarify things a bit:
The situation will always be as follows. There can be multiple producers adding items to the queue, but there will never be more than one consumer taking items from the queue.
Now, the problem is that there are multiple of these setups in parallel and serial. In other words, producers produce items for multiple queues, while consumers in their own right can also be producers. This can be more easily thought of as a directed graph of producers, consumer-producers, and finally consumers.
The reason that producers should block until the queues are empty (#Peter Lawrey) is because each of these will be running in a thread. If you leave them to simply produce as space becomes available, you will end up with a situation where you have too many threads trying to process too many things at once.
Maybe coupling this with an execution service could solve the problem?

I would suggest you use BlockingQueue.drainTo(Collection, int). You can use it with take() to ensure you get a minimum number of elements.
The advantage of using this approach is that your batch size grows dynamically with the workload and the producer doesn't have to block when the consumer is busy. i.e. it self optimises for latency and throughput.
To implement exactly as asked (which I think is a bad idea) you can use a SynchronousQueue with a busy consuming thread.
i.e. the consuming thread does a
list.clear();
while(list.size() < required) list.add(queue.take());
// process list.
The producer will block when ever the consumer is busy.

Here is a quick ( = simple but not fully tested) implementation that i think may be suitable for your requests - you should be able to extend it to support the full queue interface if you need to.
to increase performance you can switch to ReentrantLock instead of using "synchronized" keyword..
public class BatchBlockingQueue<T> {
private ArrayList<T> queue;
private Semaphore readerLock;
private Semaphore writerLock;
private int batchSize;
public BatchBlockingQueue(int batchSize) {
this.queue = new ArrayList<>(batchSize);
this.readerLock = new Semaphore(0);
this.writerLock = new Semaphore(batchSize);
this.batchSize = batchSize;
}
public synchronized void put(T e) throws InterruptedException {
writerLock.acquire();
queue.add(e);
if (queue.size() == batchSize) {
readerLock.release(batchSize);
}
}
public synchronized T poll() throws InterruptedException {
readerLock.acquire();
T ret = queue.remove(0);
if (queue.isEmpty()) {
writerLock.release(batchSize);
}
return ret;
}
}
Hope you find it useful.

I recently developed this utility that batch BlockingQueue elements using a flushing timeout if queue elements doesn't reach the batch size. It also supports fanOut pattern using multiple instances to elaborate the same set of data:
// Instantiate the registry
FQueueRegistry registry = new FQueueRegistry();
// Build FQueue consumer
registry.buildFQueue(String.class)
.batch()
.withChunkSize(5)
.withFlushTimeout(1)
.withFlushTimeUnit(TimeUnit.SECONDS)
.done()
.consume(() -> (broadcaster, elms) -> System.out.println("elms batched are: "+elms.size()));
// Push data into queue
for(int i = 0; i < 10; i++){
registry.sendBroadcast("Sample"+i);
}
More info here!
https://github.com/fulmicotone/io.fulmicotone.fqueue

Not that I am aware of. If I understand correctly you want either the producer to work (while the consumer is blocked) until it fills the queue or the consumer to work (while the producer blocks) until it clears up the queue. If that's the case may I suggest that you don't need a data structure but a mechanism to block the one party while the other one is working in a mutex fasion. You can lock on an object for that and internally have the logic of whether full or empty to release the lock and pass it to the other party. So in short, you should write it yourself :)

This sounds like how the RingBuffer works in the LMAX Disruptor pattern. See http://code.google.com/p/disruptor/ for more.
A very rough explanation is your main data structure is the RingBuffer. Producers put data in to the ring buffer in sequence and consumers can pull off as much data as the producer has put in to the buffer (so essentially batching). If the buffer is full, the producer blocks until the consumer has finished and freed up slots in the buffer.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.