Using Java threads to switch from concurrency to sequential access - java

I think I have a pretty simple problem, but I can't for the life of me figure it out (partly due to my inexperience with Java).
I am trying to read/write operations such that all read operations are concurrent (since the data is not modified), but write operations must be sequential.
Essentially, if I have an operations queue that looks like:
[R, R, R, W, R, R, W]
The first 3 reads will be handled concurrently,
but the first write operation will be blocked until the first 3 reads are done.
Once the first write is done, the next 2 reads are handled concurrently.
Likewise the second write operation is blocked until the 2 reads before are finished.
My problem :
I have a pool of threads that handle the Operations queue concurrently (using the take() method from LinkedBlockingQueue).
However, I don't know how to block the write operation--essentially how to wait for the threads doing read to finish.
Any help would be appreciated!

Take a look at ReentrantReadWriteLock: it provides exactly what you need. Some pseudocode:
private final ReadWriteLock lock = new ReentrantReadWriteLock();
private void read()
{
lock.readLock().lock();
try {
...
} finally {
lock.readLock().unlock();
}
}
private void write()
{
lock.writeLock().lock();
try {
...
} finally {
lock.writeLock().unlock();
}
}

This sounds like a good case for Java's CountDownLatch. It is specifically designed for blocking until multiple threads have completed their work.
You might use it with code like the following:
final CountDownLatch allWrittenSignal = new CountDownLatch(3);
List<Runnable> writeTasks = new ArrayList<>(3);
for (int i = 0; i < 3; ++i) {
Runnable task = new Runnable() {
#Override
public void run() {
// do write work
allWrittenSignal.countDown();
}
};
writeTasks.add(task);
}
ExecutorService execSvc = Executors.newFixedThreadPool(3);
for (Runnable task : tasks) {
execSvc.execute(task);
}
allWrittenSignal.await();
// now do read work

Related

Is this synchronized block need?

Is the synchronized block on System.out.println(number); need the following code?
import java.util.concurrent.CountDownLatch;
public class Main {
private static final Object LOCK = new Object();
private static long number = 0L;
public static void main(String[] args) throws InterruptedException {
CountDownLatch doneSignal = new CountDownLatch(10);
for (int i = 0; i < 10; i++) {
Worker worker = new Worker(doneSignal);
worker.start();
}
doneSignal.await();
synchronized (LOCK) { // Is this synchronized block need?
System.out.println(number);
}
}
private static class Worker extends Thread {
private final CountDownLatch doneSignal;
private Worker(CountDownLatch doneSignal) {
this.doneSignal = doneSignal;
}
#Override
public void run() {
synchronized (LOCK) {
number += 1;
}
doneSignal.countDown();
}
}
}
I think it's need because there is a possibility to read the cached value.
But some person say that:
It's unnecessary.
Because when the main thread reads the variable number, all of worker thread has done the write operation in memory of variable number.
doneSignal.await() is a blocking call, so your main() will only proceed when all your Worker threads have called doneSignal.countDown(), making it reach 0, which is what makes the await() method return.
There is no point adding that synchronized block before the System.out.println(), all your threads are already done at that point.
Consider using an AtomicInteger for number instead of synchronizing against a lock to call += 1.
It is not necessary:
CountDownLatch doneSignal = new CountDownLatch(10);
for (int i = 0; i < 10; i++) {
Worker worker = new Worker(doneSignal);
worker.start();
}
doneSignal.await();
// here the only thread running is the main thread
Just before dying each thread countDown the countDownLatch
#Override
public void run() {
synchronized (LOCK) {
number += 1;
}
doneSignal.countDown();
}
Only when the 10 thread finish their job the doneSignal.await(); line will be surpass.
It is not necessary because you are waiting for "done" signal. That flush memory in a way that all values from the waited thread become visible to main thread.
However you can test that easily, make inside the run method a computation that takes several (millions) steps and don't get optimized by the compiler, if you see a value different than from the final value that you expect then your final value was not already visible to main thread. Of course here the critical part is to make sure the computation doesn't get optimized so a simple "increment" is likely to get optimized. This in general is usefull to test concurrency where you are not sure if you have correct memory barriers so it may turn usefull to you later.
synchronized is not needed around System.out.println(number);, but not because the PrintWriter.println() implementations are internally synchronized or because by the time doneSignal.await() unblocks all the worker threads have finished.
synchronized is not needed because there's a happens-before edge between everything before each call to doneSignal.countDown and the completion of doneSignal.await(). This guarantees that you'll successfully see the correct value of number.
Needed
No.
However, as there is no (documented) guarantee that there will not be any interleaving it is possible to find log entries interleaved.
System.out.println("ABC");
System.out.println("123");
could print:
AB1
23C
Worthwhile
Almost certainly not. Most JVMs will implement println with a lock open JDK does.
Edge case
As suggested by #DimitarDimitrov, there is one further possible use for that lock and it is to ensure a memory barrier is crossed befor accessing number. If that is the concern then you do not need to lock, all you need to do is make number volatile.
private static volatile long number = 0L;

Starting thread in a servlet, what can be the issues?

I have a web application, that, on a single request may require to load hundreds of data. Now the problem is that data is scattered. So, I have to load data from several places, apply filters on them, process them and then respond. Performing all these operations sequentially makes servlet slow!
So I have thought of loading all the data in separate threads like t[i] = new Thread(loadData).start();, waiting for all threads to finish using while(i < count) t[i].join(); and when done, join the data and respond.
Now I am not sure if this approach is right or there is some better method. I have read somewhere is that spawning thread in servlets is not advisable.
My desired code will look something like this.
protected void doPost(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException
{
Iterable<?> requireddata = requiredData(request);
Thread[] t = new Thread[requireddata.size];
int i = 0;
while (requireddata.hasNext())
{
t[i] = new Thread(new loadData(requiredata.next())).start();
i++;
}
for(i = 0 ; i < t.length ; i++)
t[i].join();
// after getting the data process and respond!
}
The main problem is that you'll bring the server to its knees if many concurrent requests comes in for your servlet, because you don't limit the number of threads that can be spawned. Another problem is that you keep creating new threads instead of reusing them, which is inefficient.
These two problems are solved easily by using a thread pool. And Java has native support for them. Read the tutorial.
Also, make sure to shutdown the thread pool when the webapp is shut down, using a ServletContextListener.
Sounds like a problem for the CyclicBarrier.
For example:
ExecutorService executor = Executors.newFixedThreadPool(requireddata.size);
public void executeAllAndAwaitCompletion(List<? extends T> threads){
final CyclicBarrier barrier = new CyclicBarrier(threads.size() + 1);
for(final T thread : threads){
executor.submit(new Runnable(){
public void run(){
//it is not a mistake to call run() here
thread.run();
barrier.await();
}
});
}
barrier.await();
}
The last thread from threads will be excuted once the all others finish.
Instead of calling Executors.newFixedThreadPool(requireddata.size);, it is better to reuse some existing thread pool.
You may consider using Executor framework from java.util.concurrent api. For example you can create your computation task as Callable and then submit that task to a ThreadPoolExecutor. Sample code from Java Concurrency in Practice:-
public class Renderer {
private final ExecutorService executor;
Renderer(ExecutorService executor) { this.executor = executor; }
void renderPage(CharSequence source) {
final List<ImageInfo> info = scanForImageInfo(source);
CompletionService<ImageData> completionService =
new ExecutorCompletionService<ImageData>(executor);
for (final ImageInfo imageInfo : info)
completionService.submit(new Callable<ImageData>() {
public ImageData call() {
return imageInfo.downloadImage();
}
});
renderText(source);
try {
for (int t = 0, n = info.size(); t < n; t++) {
Future<ImageData> f = completionService.take();
ImageData imageData = f.get();
renderImage(imageData);
}
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
} catch (ExecutionException e) {
throw launderThrowable(e.getCause());
}
}
}
Since you are waiting for all the threads to complete and then you are providing the response, IMO multiple threads won't help if you are using just CPU cycles. It will only increase the response time by adding the context switch delay in the threads. A single thread will be better. However if network/IO etc are involved you can make use of thread pool.
But you would like to re-consider your approach. Processing huge amount of data synchronously in a http request is not advisable. Will not be a good experience for the end user. What you can do is start a thread to process the data and provide a response saying "It is processing". You can provide the web user with some kind gesture to check the status whenever he wants.

How to wait for all threads to complete

I created some workflow how to wait for all thread which I created. This example works in 99 % of cases but sometimes method waitForAllDone is finished sooner then all thread are completed. I know it because after waitForAllDone I am closing stream which is using created thread so then occurs exception
Caused by: java.io.IOException: Stream closed
my thread start with:
#Override
public void run() {
try {
process();
} finally {
Factory.close(this);
}
}
closing:
protected static void close(final Client client) {
clientCount--;
}
when I creating thread I call this:
public RobWSClient getClient() {
clientCount++;
return new Client();
}
and clientCount variable inside factory:
private static volatile int clientCount = 0;
wait:
public void waitForAllDone() {
try {
while (clientCount > 0) {
Thread.sleep(10);
}
} catch (InterruptedException e) {
LOG.error("Error", e);
}
}
You need to protect the modification and reading of clientCount via synchronized. The main issue is that clientCount-- and clientCount++ are NOT an atomic operation and therefore two threads could execute clientCount-- / clientCount++ and end up with the wrong result.
Simply using volatile as you do above would ONLY work if ALL operations on the field were atomic. Since they are not, you need to use some locking mechanism. As Anton states, AtomicInteger is an excellent choice here. Note that it should be either final or volatile to ensure it is not thread-local.
That being said, the general rule post Java 1.5 is to use a ExecutorService instead of Threads. Using this in conjuction with Guava's Futures class could make waiting for all to complete to be as simple as:
Future<List<?>> future = Futures.successfulAsList(myFutureList);
future.get();
// all processes are complete
Futures.successfulAsList
I'm not sure that the rest of your your code has no issues, but you can't increment volatile variable like this - clientCount++; Use AtomicInteger instead
The best way to wait for threads to terminate, is to use one of the high-level concurrency facilities.
In this case, the easiest way would be to use an ExecutorService.
You would 'offer' a new task to the executor in this way:
...
ExecutorService executor = Executors.newFixedThreadPool(POOL_SIZE);
...
Client client = getClient(); //assuming Client implements runnable
executor.submit(client);
...
public void waitForAllDone() {
executor.awaitTermination(30, TimeUnit.SECOND) ; wait termination of all threads for 30 secs
...
}
In this way, you don't waste valuable CPU cycles in busy waits or sleep/awake cycles.
See ExecutorService docs for details.

AtomicReference to a mutable object and visibility

Say I have an AtomicReferenceto a list of objects:
AtomicReference<List<?>> batch = new AtomicReference<List<Object>>(new ArrayList<Object>());
Thread A adds elements to this list: batch.get().add(o);
Later, thread B takes the list and, for example, stores it in a DB: insertBatch(batch.get());
Do I have to do additional synchronization when writing (Thread A) and reading (Thread B) to ensure thread B sees the list the way A left it, or is this taken care of by the AtomicReference?
In other words: if I have an AtomicReference to a mutable object, and one thread changes that object, do other threads see this change immediately?
Edit:
Maybe some example code is in order:
public void process(Reader in) throws IOException {
List<Future<AtomicReference<List<Object>>>> tasks = new ArrayList<Future<AtomicReference<List<Object>>>>();
ExecutorService exec = Executors.newFixedThreadPool(4);
for (int i = 0; i < 4; ++i) {
tasks.add(exec.submit(new Callable<AtomicReference<List<Object>>>() {
#Override public AtomicReference<List<Object>> call() throws IOException {
final AtomicReference<List<Object>> batch = new AtomicReference<List<Object>>(new ArrayList<Object>(batchSize));
Processor.this.parser.parse(in, new Parser.Handler() {
#Override public void onNewObject(Object event) {
batch.get().add(event);
if (batch.get().size() >= batchSize) {
dao.insertBatch(batch.getAndSet(new ArrayList<Object>(batchSize)));
}
}
});
return batch;
}
}));
}
List<Object> remainingBatches = new ArrayList<Object>();
for (Future<AtomicReference<List<Object>>> task : tasks) {
try {
AtomicReference<List<Object>> remainingBatch = task.get();
remainingBatches.addAll(remainingBatch.get());
} catch (ExecutionException e) {
Throwable cause = e.getCause();
if (cause instanceof IOException) {
throw (IOException)cause;
}
throw (RuntimeException)cause;
}
}
// these haven't been flushed yet by the worker threads
if (!remainingBatches.isEmpty()) {
dao.insertBatch(remainingBatches);
}
}
What happens here is that I create four worker threads to parse some text (this is the Reader in parameter to the process() method). Each worker saves the lines it has parsed in a batch, and flushes the batch when it is full (dao.insertBatch(batch.getAndSet(new ArrayList<Object>(batchSize)));).
Since the number of lines in the text isn't a multiple of the batch size, the last objects end up in a batch that isn't flushed, since it's not full. These remaining batches are therefore inserted by the main thread.
I use AtomicReference.getAndSet() to replace the full batch with an empty one. It this program correct with regards to threading?
Um... it doesn't really work like this. AtomicReference guarantees that the reference itself is visible across threads i.e. if you assign it a different reference than the original one the update will be visible. It makes no guarantees about the actual contents of the object that reference is pointing to.
Therefore, read/write operations on the list contents require separate synchronization.
Edit: So, judging from your updated code and the comment you posted, setting the local reference to volatile is sufficient to ensure visibility.
I think that, forgetting all the code here, you exact question is this:
Do I have to do additional synchronization when writing (Thread A) and
reading (Thread B) to ensure thread B sees the list the way A left it,
or is this taken care of by the AtomicReference?
So, the exact response to that is: YES, atomic take care of visibility. And it is not my opinion but the JDK documentation one:
The memory effects for accesses and updates of atomics generally follow the rules for volatiles, as stated in The Java Language Specification, Third Edition (17.4 Memory Model).
I hope this helps.
Adding to Tudor's answer: You will have to make the ArrayList itself threadsafe or - depending on your requirements - even larger code blocks.
If you can get away with a threadsafe ArrayList you can "decorate" it like this:
batch = java.util.Collections.synchronizedList(new ArrayList<Object>());
But keep in mind: Even "simple" constructs like this are not threadsafe with this:
Object o = batch.get(batch.size()-1);
The AtomicReference will only help you with the reference to the list, it will not do anything to the list itself. More particularly, in your scenario, you will almost certainly run into problems when the system is under load where the consumer has taken the list while the producer is adding an item to it.
This sound to me like you should be using a BlockingQueue. You can then Limit the memory footprint if you producer is faster than your consumer and let the queue handle all contention.
Something like:
ArrayBlockingQueue<Object> queue = new ArrayBlockingQueue<Object> (50);
// ... Producer
queue.put(o);
// ... Consumer
List<Object> queueContents = new ArrayList<Object> ();
// Grab everything waiting in the queue in one chunk. Should never be more than 50 items.
queue.drainTo(queueContents);
Added
Thanks to #Tudor for pointing out the architecture you are using. ... I have to admit it is rather strange. You don't really need AtomicReference at all as far as I can see. Each thread owns its own ArrayList until it is passed on to dao at which point it is replaced so there is no contention at all anywhere.
I am a little concerned about you creating four parser on a single Reader. I hope you have some way of ensuring each parser does not affect the others.
I personally would use some form of producer-consumer pattern as I have described in the code above. Something like this perhaps.
static final int PROCESSES = 4;
static final int batchSize = 10;
public void process(Reader in) throws IOException, InterruptedException {
final List<Future<Void>> tasks = new ArrayList<Future<Void>>();
ExecutorService exec = Executors.newFixedThreadPool(PROCESSES);
// Queue of objects.
final ArrayBlockingQueue<Object> queue = new ArrayBlockingQueue<Object> (batchSize * 2);
// The final object to post.
final Object FINISHED = new Object();
// Start the producers.
for (int i = 0; i < PROCESSES; i++) {
tasks.add(exec.submit(new Callable<Void>() {
#Override
public Void call() throws IOException {
Processor.this.parser.parse(in, new Parser.Handler() {
#Override
public void onNewObject(Object event) {
queue.add(event);
}
});
// Post a finished down the queue.
queue.add(FINISHED);
return null;
}
}));
}
// Start the consumer.
tasks.add(exec.submit(new Callable<Void>() {
#Override
public Void call() throws IOException {
List<Object> batch = new ArrayList<Object>(batchSize);
int finishedCount = 0;
// Until all threads finished.
while ( finishedCount < PROCESSES ) {
Object o = queue.take();
if ( o != FINISHED ) {
// Batch them up.
batch.add(o);
if ( batch.size() >= batchSize ) {
dao.insertBatch(batch);
// If insertBatch takes a copy we could merely clear it.
batch = new ArrayList<Object>(batchSize);
}
} else {
// Count the finishes.
finishedCount += 1;
}
}
// Finished! Post any incopmplete batch.
if ( batch.size() > 0 ) {
dao.insertBatch(batch);
}
return null;
}
}));
// Wait for everything to finish.
exec.shutdown();
// Wait until all is done.
boolean finished = false;
do {
try {
// Wait up to 1 second for termination.
finished = exec.awaitTermination(1, TimeUnit.SECONDS);
} catch (InterruptedException ex) {
}
} while (!finished);
}

Java read & write lock requirement, with lock and release from different threads

I'm trying to find a less clunky solution to a Java concurrency problem.
The gist of the problem is that I need a shutdown call to block while there are still worker threads active, but the crucial aspect is that the worker tasks are each spawned and completed asynchronously so the hold and release must be done by different threads. I need them to somehow send a signal to the shutdown thread once their work has completed. Just to make things more interesting, the worker threads cannot block each other so I'm unsure about the application of a Semaphore in this particular instance.
I have a solution which I think safely does the job, but my unfamiliarity with the Java concurrency utils leads me to think that there might be a much easier or more elegant pattern. Any help in this regard would be greatly appreciated.
Here's what I have so far, fairly sparse except for the comments:
final private ReentrantReadWriteLock shutdownLock = new ReentrantReadWriteLock();
volatile private int activeWorkerThreads;
private boolean isShutdown;
private void workerTask()
{
try
{
// Point A: Worker tasks mustn't block each other.
shutdownLock.readLock().lock();
// Point B: I only want worker tasks to continue if the shutdown signal
// hasn't already been received.
if (isShutdown)
return;
activeWorkerThreads ++;
// Point C: This async method call returns immediately, soon after which
// we release our lock. The shutdown thread may then acquire the write lock
// but we want it to continue blocking until all of the asynchronous tasks
// have completed.
executeAsynchronously(new Runnable()
{
#Override
final public void run()
{
try
{
// Do stuff.
}
finally
{
// Point D: Release of shutdown thread loop, if there are no other
// active worker tasks.
activeWorkerThreads --;
}
}
});
}
finally
{
shutdownLock.readLock().unlock();
}
}
final public void shutdown()
{
try
{
// Point E: Shutdown thread must block while any worker threads
// have breached Point A.
shutdownLock.writeLock().lock();
isShutdown = true;
// Point F: Is there a better way to wait for this signal?
while (activeWorkerThreads > 0)
;
// Do shutdown operation.
}
finally
{
shutdownLock.writeLock().unlock();
}
}
Thanks in advance for any help!
Russ
Declaring activeWorkerThreads as volatile doesn't allow you to do activeWorkerThreads++, as ++ is just shorthand for,
activeWorkerThreads = activeWorkerThreads + 1;
Which isn't atomic. Use AtomicInteger instead.
Does executeAsynchronously() send jobs to a ExecutorService? If so you can just use the awaitTermination method, so your shutdown hook will be,
executor.shutdown();
executor.awaitTermination(1, TimeUnit.Minutes);
You can use a semaphore in this scenario and not require a busy wait for the shutdown() call. The way to think of it is as a set of tickets that are handed out to workers to indicate that they are in-flight. If the shutdown() method can acquire all of the tickets then it knows that it has drained all workers and there is no activity. Because #acquire() is a blocking call the shutdown() won't spin. I've used this approach for a distributed master-worker library and its easy extend it to handle timeouts and retrials.
Executor executor = // ...
final int permits = // ...
final Semaphore semaphore = new Semaphore(permits);
void schedule(final Runnable task) {
semaphore.acquire();
try {
executor.execute(new Runnable() {
#Override public run() {
try {
task.run();
} finally {
semaphore.release();
}
}
});
} catch (RejectedExecutionException e) {
semaphore.release();
throw e;
}
}
void shutDown() {
semaphore.acquireUninterruptibly(permits);
// do stuff
}
ExecutorService should be a preferred solution as sbridges mentioned.
As an alternative, if the number of worker threads is fixed, then you can use CountDownLatch:
final CountDownLatch latch = new CountDownLatch(numberOfWorkers);
Pass the latch to every worker thread and call latch.countDown() when task is done.
Call latch.await() from the main thread to wait for all tasks to complete.
Whoa nelly. Never do this:
// Point F: Is there a better way to wait for this signal?
while (activeWorkerThreads > 0)
;
You're spinning and consuming CPU. Use a proper notification:
First: synchronize on an object, then check activeWorkerThreads, and wait() on the object if it's still > 0:
synchronized (mutexObject) {
while (activeWorkerThreads > 0) {
mutexObject.wait();
}
}
Second: Have the workers notify() the object after they decrement the activeWorkerThreads count. You must synchronize on the object before calling notify.
synchronized (mutexObject) {
activeWorkerThreads--;
mutexObject.notify();
}
Third: Seeing as you are (after implementing 1 & 2) synchronizing on an object whenever you touch activeWorkerThreads, use it as protection; there is no need for the variable to be volatile.
Then: the same object you use as a mutex for controlling access to activeWorkerThreads could also be used to control access to isShutdown. Example:
synchronized (mutexObject) {
if (isShutdown) {
return;
}
}
This won't cause workers to block each other except for immeasurably small amounts of time (which you likely do not avoid by using a read-write lock anyway).
This is more like a comment to sbridges answer, but it was a bit too long to submit as a comment.
Anyways, just 1 comment.
When you shutdown the executor, submitting new task to the executor will result in unchecked RejectedExecutionException if you use the default implementations (like Executors.newSingleThreadExecutor()). So in your case you probably want to use the following code.
code:
new ThreadPoolExecutor(1,
1,
1,
TimeUnit.HOURS,
new LinkedBlockingQueue<Runnable>(),
new ThreadPoolExecutor.DiscardPolicy());
This way, the tasks that were submitted to the executor after shutdown() was called, are simply ignored. The parameter above (1,1... etc) should produce an executor that basically is a single-thread executor, but doesn't throw the runtime exception.

Categories

Resources