Java Multithreading: threads shall access list - java

I'm running with five threads, and I have a list of objects (which I initialize independently of the threads).
The objects in the list use a boolean as a flag, so I know if they have been handled by another Thread already. Also, my Thread has an Integer for its "ID" (so U know which thread is currently working).
The problem: The first thread that gets a hand on the for-loop will handle all objects in the list, but I want the threads to alternate. What am I doing wrong?
the run() method looks similar to this:
void run() {
for (int i = 0; i < list.size(); i++) {
ListObject currentObject = list.get(i);
synchronized (currentObject) {
if (currentObject.getHandled == false) {
currentObject.setHandled(true);
System.out.println("Object is handled by " + this.getID());
} else {
continue;
}
}
}
}

TL;DR Explicitly or implicitly divide the list among the threads; and synchronization if really needed;
The problem: The first thread that gets a hand on the for-loop will
handle all objects in the list, but i want the threads to alternate.
What am I doing wrong?
That is expectable this entire block of code
for (int i = 0; i < list.size(); i++) {
ListObject currentObject = list.get(i);
synchronized (currentObject) {
....
}
}
is basically being executed sequentially since each thread synchronizes in every iteration using the Object currentObject implicit lock. All five threads enter the run method, however one of them enters first in the synchronized (currentObject) all the other will wait in turn for the first thread to release the currentObject implicitly lock. When the thread is finished moves on to the next iteration while the remaining threads are still in the previous iteration. Hence, the first thread entering synchronized (currentObject) will have a head start, and will be steps head of the previous threads, and will likely compute all the remains iterations. Consequently:
The first thread that gets a hand on the for-loop will handle all
objects in the list,
As it is you would be better off performance-wise and readability-wise executing the code sequentially.
Assumption
I am assuming that
the objects stored on the list are not being accessed elsewhere at the same time that those threads are iterating through the list;
the list does not contain multiple references to the same object;
I would suggest that instead of every thread iterating over the entire list and synchronizing in every iteration -- which is extremely non perform and actually defeats the point of parallelism -- every thread would compute a different chunk of the list (e.g., dividing the iterations of the for loop among the threads). For instance:
Approach 1: Using Parallel Stream
If you don't have to explicitly parallelize your code then consider using ParallelStream:
list.parallelStream().forEach(this::setHandled);
private void setHandled(ListObject currentObject) {
if (!currentObject.getHandled) {
currentObject.setHandled(true);
System.out.println("Object is handled by " + this.getID());
}
}
Approach 2 : If you have to explicitly parallelized the code using executors
I'm running five threads,
(as first illustrated by ernest_k)
ExecutorService ex = Executors.newFixedThreadPool(5);
for (ListObject l : list)
ex.submit(() -> setHandled(l));
...
private void setHandled(ListObject currentObject) {
if (!currentObject.getHandled) {
currentObject.setHandled(true);
System.out.println("Object is handled by " + this.getID());
}
}
Approach 3: If you have to explicitly use the Threads
void run() {
for (int i = threadID; i < list.size(); i += total_threads) {
ListObject currentObject = list.get(i);
if (currentObject.getHandled == false) {
currentObject.setHandled(true);
System.out.println("Object is handled by " + this.getID());
}
}
}
In this approach, I am splitting the iterations of the for loop among threads in a round-robin fashion, assuming that total_threads is the number of threads that will compute the run method, and that each thread will have a unique threadID ranging from 0 to total_threads - 1. Other approaches to distribute the iterations among threads would also so be visible, for instance dynamically distribution the iterations among threads:
void run() {
for (int i = task.getAndIncrement(); i < list.size(); i = task.getAndIncrement();) {
ListObject currentObject = list.get(i);
if (currentObject.getHandled == false) {
currentObject.setHandled(true);
System.out.println("Object is handled by " + this.getID());
}
}
}
where task would be an atomic integer (i.e., AtomicInteger task = new AtomicInteger();).
In all approaches the idea is the same assign different chunks of the list to the threads so that those threads can execute those chunks independently of each other.
If the assumptions 1. and 2. cannot be made then you can still apply the aforementioned logic of splitting the iterations among threads but you will need to add synchronization, in my examples to the follow block of code:
private void setHandled(ListObject currentObject) {
if (!currentObject.getHandled) {
currentObject.setHandled(true);
System.out.println("Object is handled by " + this.getID());
}
}
as it is you can just turn the currentObject field into an AtomicBoolean as follows:
private void setHandled(ListObject currentObject) {
if (currentObject.getHandled.compareAndSet(false, true)) {
System.out.println("Object is handled by " + this.getID());
}
}
otherwise use the synchronized clause:
private void setHandled(ListObject currentObject) {
synchronized (currentObject) {
if (!currentObject.getHandled) {
currentObject.setHandled(true);
System.out.println("Object is handled by " + this.getID());
}
}
}

Related

Java Selling Tickets with Multithreading

I have two threads to sell tickets.
public class MyThread {
public static void main(String[] args) {
Ticket ticket = new Ticket();
Thread thread1 = new Thread(()->{
for (int i = 0; i < 30; i++) {
ticket.sell();
} }, "A");
thread1.start();
Thread thread2 = new Thread(()->{
for (int i = 0; i < 30; i++) {
ticket.sell();
} }, "B");
thread2.start();
}
}
class Ticket {
private Integer num = 20 ;
private Object obj = new Object();
public void sell() {
// why shouldn't I use "num" as a monitor object ?
// I thought "num" is unique among two threads.
synchronized ( num ) {
if (this.num >= 0) {
System.out.println(Thread.currentThread().getName() + " sells " + this.num + "th ticket");
this.num--;
}
}
}
}
The output will be wrong if I use num as a monitor object.
But if I use obj as a monitor object, the output will be correct.
What's the difference between using num and using obj ?
===============================================
And why does it still not work if I use (Object)num as a monitor object ?
class Ticket {
private int num = 20 ;
private Object obj = new Object();
public void sell() {
// Can I use (Object)num as a monitor object ?
synchronized ( (Object)num ) {
if (this.num >= 0) {
System.out.println(Thread.currentThread().getName() + " sells " + this.num + "th ticket");
this.num--;
}
}
}
}
Integer is a boxed value. It contains a primitive int, and the compiler deals with autoboxing/autounboxing that int. Because of this, the statement this.num-- is actually:
num=Integer.valueOf(num.intValue()-1)
That is, the num instance containing the lock is lost once you perform that update.
The fundamental problem here is synchronizing on a non-final value.
The most important thing to understand about the Java Memory Model - that is, what values a thread sees whilst executing a Java program - is the happens-before relationship.
In the specific case of a synchronized block, actions done in one thread before exiting the synchronized block happen before actions done inside the synchronized block in another thread - so, if the first thread increments a variable inside that synchronized block, the second thread sees that updated value.
This goes over and above the well-known fact that a synchronized block can only be entered by one thread at a time: only one thread at a time and you get to see what the previous thread did.
// Thread 1 // Thread 2
synchronized (monitor) {
num = 1
} // Exiting monitor
// *happens before*
// entering monitor
synchronized (monitor) {
int n = num; // Guaranteed to see n = 1 (provided no other thread has entered a block synchronized on monitor and changed it first).
}
There is a very important caveat to this guarantee: it only holds if the two executions of the synchronized block use the same monitor. And that's not the same variable, it's the same actual concrete object on the heap (variables don't have monitors, they're just pointers to a value in the heap).
So, if you reassign the monitor inside the synchronized block:
synchronized (num) {
if (num > 0) {
num--; // This is the same as `num = Integer.valueOf(num.intValue() - 1);`
}
}
then you are destroying the happens-before guarantee, because the next thread to arrive at that synchronized block is entering the monitor of a different object (*).
Once you do, the behavior of your program is ill-defined: if you're lucky, it fails in an obvious way; if you're very unlucky, it can seem to work, and then start failing mysteriously at a later date.
Your code is just broken.
This isn't something that's specific to Integers either: this code would have the same problem.
// Assume `Object someObject = new Object();` is defined as a field.
synchronized (someObject) {
someObject = new Object();
}
(*) Actually, you still get a happens-before relationship for the new object: it's just not for the things inside this synchronized block, it's for things that happened in some other synchronized block that used the object as the monitor. Essentially, it's impossible to reason about what this means, so you may as well just consider it "broken".
The correct way to do it is to synchronize on a field that you can't (not just don't) reassign. You could simply synchronize on this (which can't be reassigned):
synchronized (this) {
if (num > 0) {
num--; // This is the same as `num = Integer.valueOf(num.intValue() - 1);`
}
}
Now it doesn't matter that you're reassigning num inside the block, because you're not synchronizing on it any more. You get the happens-before guarantee from the fact that you're always synchronizing on the same thing.
Note, however, that you must always access num from inside a synchronized block - for example, if you have a getter to get the number of tickets remaining, that must also synchronize on this, in order to get the happens-before guarantee that the value changed in the sell() method is visible in that getter.
This works, but it may not be entirely desirable: anybody who has access to a reference to your Ticket instance can also synchronize on it. This means they can potentially deadlock your code.
Instead, it is a common practice to introduce a private field which is used purely for locking: this is what the obj field gives you. The only modification from your code should be to make it final (and give it a better name than obj):
private final Object obj = new Object();
This can't be accessed outside your class, so nefarious clients cannot cause a deadlock for you directly.
Again, this can't be reassigned inside your synchronized block (or anywhere else), so there is no risk of you breaking the happens-before guarantee by reassigning it.

Terribly slow synchronization

I'm trying to write game of life on many threads, 1 cell = 1 thread, it requires synchronization between threads, so no thread will start calculating it new state before other thread does not finish reading previous state. here is my code
public class Cell extends Processor{
private static int count = 0;
private static Semaphore waitForAll = new Semaphore(0);
private static Semaphore waiter = new Semaphore(0);
private IntField isDead;
public Cell(int n)
{
super(n);
count ++;
}
public void initialize()
{
this.algorithmName = Cell.class.getSimpleName();
isDead = new IntField(0);
this.addField(isDead, "state");
}
public synchronized void step()
{
int size = neighbours.size();
IntField[] states = new IntField[size];
int readElementValue = 0;
IntField readElement;
sendAll(new IntField(isDead.getDist()));
Cell.waitForAll.release();
//here wait untill all other threads finish reading
while (Cell.waitForAll.availablePermits() != Cell.count) {
}
//here release semaphore neader lower
Cell.waiter.release();
for (int i = 0; i < neighbours.size(); i++) {
readElement = (IntField) reciveMessage(neighbours.get(i));
states[i] = (IntField) reciveMessage(neighbours.get(i));
}
int alive = 0;
int dead = 0;
for(IntField ii: states)
{
if(ii.getDist() == 1)
alive++;
else
dead++;
}
if(isDead.getDist() == 0)
{
if(alive == 3)
isDead.setValue(1);
else
;
}
else
{
if(alive == 3 || alive == 2)
;
else
isDead.setValue(0);
}
try {
while(Cell.waiter.availablePermits() != Cell.count)
{
;
//if every thread finished reading we can acquire this semaphore
}
Cell.waitForAll.acquire();
while(Cell.waitForAll.availablePermits() != 0)
;
//here we make sure every thread ends step in same moment
Cell.waiter.acquire();
} catch (InterruptedException e) {
e.printStackTrace();
}
}
processor
class extends thread and in run method if i turn switch on it calls step() method. well it works nice for small amount of cells but when i run abou 36 cells it start to be very slow, how can repair my synchronization so it woudl be faster?
Using large numbers of threads tends not to be very efficient, but 36 is not so many that I would expect that in itself to produce a difference that you would characterize as "very slow". I think more likely the problem is inherent in your strategy. In particular, I suspect this busy-wait is problematic:
Cell.waitForAll.release();
//here wait untill all other threads finish reading
while (Cell.waitForAll.availablePermits() != Cell.count) {
}
Busy-waiting is always a performance problem because you are tying up the CPU with testing the condition over and over again. This busy-wait is worse than most, because it involves testing the state of a synchronization object, and this not only has extra overhead, but also introduces extra interference among threads.
Instead of busy-waiting, you want to use one of the various methods for making threads suspend execution until a condition is satisfied. It looks like what you've actually done is created a poor-man's version of a CyclicBarrier, so you might consider instead using CyclicBarrier itself. Alternatively, since this is a learning exercise you might benefit from learning how to use Object.wait(), Object.notify(), and Object.notifyAll() -- Java's built-in condition variable implementation.
If you insist on using semaphores, then I think you could do it without the busy-wait. The key to using semaphores is that it is being able to acquire the semaphore (at all) that indicates that the thread can proceed, not the number of available permits. If you maintain a separate variable with which to track how many threads are waiting on a given semaphore at a given point, then each thread reaching that point can determine whether to release all the other threads (and proceed itself) or whether to block by attempting to acquire the semaphore.

Why is a semaphore used when using synchronization?

I was reading about semaphore's and in the code example it confused me why a semaphore was used when the code uses sychronization around the method that is ultimately called. Isn't that doing the same thing, i.e. restricting 1 thread at a time to perform the mutation?
class Pool {
private static final int MAX_AVAILABLE = 100;
private final Semaphore available = new Semaphore(MAX_AVAILABLE, true);
public Object getItem() throws InterruptedException {
available.acquire();
return getNextAvailableItem();
}
public void putItem(Object x) {
if (markAsUnused(x))
available.release();
}
// Not a particularly efficient data structure; just for demo
protected Object[] items = ... whatever kinds of items being managed
protected boolean[] used = new boolean[MAX_AVAILABLE];
protected synchronized Object getNextAvailableItem() {
for (int i = 0; i < MAX_AVAILABLE; ++i) {
if (!used[i]) {
used[i] = true;
return items[i];
}
}
return null; // not reached
}
protected synchronized boolean markAsUnused(Object item) {
for (int i = 0; i < MAX_AVAILABLE; ++i) {
if (item == items[i]) {
if (used[i]) {
used[i] = false;
return true;
} else
return false;
}
}
return false;
}
}
I'm referring to the call to getItem() which calls acquire(), and then calls getNextAvailableItem, but that is synchronized anyhow.
What am I missing?
Reference: http://docs.oracle.com/javase/8/docs/api/java/util/concurrent/Semaphore.html
The semaphore and the synchronized block are doing two different jobs.
The synchronized keyword is protecting getNextAvailableItem() when it is accessing and mutating the array of items. An operation that would corrupt if it was not restricted to one thread at a time.
The semaphore will allow up to 100 threads through, significantly more than 1. Its purpose in this code sample is to block requests for an object from the pool when the pool is empty, and to then unblock one thread when an object is returned to the pool. Without the semaphore, things would look like they were working until the pool was empty. At that time requesting threads would not block and wait for an object to be returned, but would instead receive null.
A Semaphore gives you a thread-safe counter that blocks when the acquire has been called beyond the initial limit. release can be used to undo an acquire.
It will guarantee that if a call to acquire succeeds there is sufficient capacity to hold the new item.
In the sample there are loops that look for a free item. Using a Semaphore ensures that none of those loops are begun until there is a free item.
synchronized only guarantees that ony one thread can execute this section of code at a time.

Java concurrency counter not properly clean up

This is a java concurrency question. 10 jobs need to be done, each of them will have 32 worker threads. Worker thread will increase a counter . Once the counter is 32, it means this job is done and then clean up counter map. From the console output, I expect that 10 "done" will be output, pool size is 0 and counterThread size is 0.
The issues are :
most of time, "pool size: 0 and countThreadMap size:3" will be
printed out. even those all threads are gone, but 3 jobs are not
finished yet.
some time, I can see nullpointerexception in line 27. I have used ConcurrentHashMap and AtomicLong, why still have concurrency
exception.
Thanks
import java.util.concurrent.ConcurrentHashMap;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.ThreadPoolExecutor;
import java.util.concurrent.atomic.AtomicLong;
public class Test {
final ConcurrentHashMap<Long, AtomicLong[]> countThreadMap = new ConcurrentHashMap<Long, AtomicLong[]>();
final ExecutorService cachedThreadPool = Executors.newCachedThreadPool();
final ThreadPoolExecutor tPoolExecutor = ((ThreadPoolExecutor) cachedThreadPool);
public void doJob(final Long batchIterationTime) {
for (int i = 0; i < 32; i++) {
Thread workerThread = new Thread(new Runnable() {
#Override
public void run() {
if (countThreadMap.get(batchIterationTime) == null) {
AtomicLong[] atomicThreadCountArr = new AtomicLong[2];
atomicThreadCountArr[0] = new AtomicLong(1);
atomicThreadCountArr[1] = new AtomicLong(System.currentTimeMillis()); //start up time
countThreadMap.put(batchIterationTime, atomicThreadCountArr);
} else {
AtomicLong[] atomicThreadCountArr = countThreadMap.get(batchIterationTime);
atomicThreadCountArr[0].getAndAdd(1);
countThreadMap.put(batchIterationTime, atomicThreadCountArr);
}
if (countThreadMap.get(batchIterationTime)[0].get() == 32) {
System.out.println("done");
countThreadMap.remove(batchIterationTime);
}
}
});
tPoolExecutor.execute(workerThread);
}
}
public void report(){
while(tPoolExecutor.getActiveCount() != 0){
//
}
System.out.println("pool size: "+ tPoolExecutor.getActiveCount() + " and countThreadMap size:"+countThreadMap.size());
}
public static void main(String[] args) throws Exception {
Test test = new Test();
for (int i = 0; i < 10; i++) {
Long batchIterationTime = System.currentTimeMillis();
test.doJob(batchIterationTime);
}
test.report();
System.out.println("All Jobs are done");
}
}
Let’s dig through all the mistakes of thread related programming, one man can make:
Thread workerThread = new Thread(new Runnable() {
…
tPoolExecutor.execute(workerThread);
You create a Thread but don’t start it but submit it to an executor. It’s a historical mistake of the Java API to let Thread implement Runnable for no good reason. Now, every developer should be aware, that there is no reason to treat a Thread as a Runnable. If you don’t want to start a thread manually, don’t create a Thread. Just create the Runnable and pass it to execute or submit.
I want to emphasize the latter as it returns a Future which gives you for free what you are attempting to implement: the information when a task has been finished. It’s even easier when using invokeAll which will submit a bunch of Callables and return when all are done. Since you didn’t tell us anything about your actual task, it’s not clear whether you can let your tasks simply implement Callable (may return null) instead of Runnable.
If you can’t use Callables or don’t want to wait immediately on submission, you have to remember the returned Futures and query them at a later time:
static final ExecutorService cachedThreadPool = Executors.newCachedThreadPool();
public static List<Future<?>> doJob(final Long batchIterationTime) {
final Random r=new Random();
List<Future<?>> list=new ArrayList<>(32);
for (int i = 0; i < 32; i++) {
Runnable job=new Runnable() {
public void run() {
// pretend to do something
LockSupport.parkNanos(TimeUnit.SECONDS.toNanos(r.nextInt(10)));
}
};
list.add(cachedThreadPool.submit(job));
}
return list;
}
public static void main(String[] args) throws Exception {
Test test = new Test();
Map<Long,List<Future<?>>> map=new HashMap<>();
for (int i = 0; i < 10; i++) {
Long batchIterationTime = System.currentTimeMillis();
while(map.containsKey(batchIterationTime))
batchIterationTime++;
map.put(batchIterationTime,doJob(batchIterationTime));
}
// print some statistics, if you really need
int overAllDone=0, overallPending=0;
for(Map.Entry<Long,List<Future<?>>> e: map.entrySet()) {
int done=0, pending=0;
for(Future<?> f: e.getValue()) {
if(f.isDone()) done++;
else pending++;
}
System.out.println(e.getKey()+"\t"+done+" done, "+pending+" pending");
overAllDone+=done;
overallPending+=pending;
}
System.out.println("Total\t"+overAllDone+" done, "+overallPending+" pending");
// wait for the completion of all jobs
for(List<Future<?>> l: map.values())
for(Future<?> f: l)
f.get();
System.out.println("All Jobs are done");
}
But note that if you don’t need the ExecutorService for subsequent tasks, it’s much easier to wait for all jobs to complete:
cachedThreadPool.shutdown();
cachedThreadPool.awaitTermination(Long.MAX_VALUE, TimeUnit.DAYS);
System.out.println("All Jobs are done");
But regardless of how unnecessary the manual tracking of the job status is, let’s delve into your attempt, so you may avoid the mistakes in the future:
if (countThreadMap.get(batchIterationTime) == null) {
The ConcurrentMap is thread safe, but this does not turn your concurrent code into sequential one (that would render multi-threading useless). The above line might be processed by up to all 32 threads at the same time, all finding that the key does not exist yet so possibly more than one thread will then be going to put the initial value into the map.
AtomicLong[] atomicThreadCountArr = new AtomicLong[2];
atomicThreadCountArr[0] = new AtomicLong(1);
atomicThreadCountArr[1] = new AtomicLong(System.currentTimeMillis());
countThreadMap.put(batchIterationTime, atomicThreadCountArr);
That’s why this is called the “check-then-act” anti-pattern. If more than one thread is going to process that code, they all will put their new value, being confident that this was the right thing as they have checked the initial condition before acting but for all but one thread the condition has changed when acting and they are overwriting the value of a previous put operation.
} else {
AtomicLong[] atomicThreadCountArr = countThreadMap.get(batchIterationTime);
atomicThreadCountArr[0].getAndAdd(1);
countThreadMap.put(batchIterationTime, atomicThreadCountArr);
Since you are modifying the AtomicInteger which is already stored into the map, the put operation is useless, it will put the very array that it retrieved before. If there wasn’t the mistake that there can be multiple initial values as described above, the put operation had no effect.
}
if (countThreadMap.get(batchIterationTime)[0].get() == 32) {
Again, the use of a ConcurrentMap doesn’t turn the multi-threaded code into sequential code. While it is clear that the only last thread will update the atomic integer to 32 (when the initial race condition doesn’t materialize), it is not guaranteed that all other threads have already passed this if statement. Therefore more than one, up to all threads can still be at this point of execution and see the value of 32. Or…
System.out.println("done");
countThreadMap.remove(batchIterationTime);
One of the threads which have seen the 32 value might execute this remove operation. At this point, there might be still threads not having executed the above if statement, now not seeing the value 32 but producing a NullPointerException as the array supposed to contain the AtomicInteger is not in the map anymore. This is what happens, occasionally…
After creating your 10 jobs, your main thread is still running - it doesn't wait for your jobs to complete before it calls report on the test. You try to overcome this with the while loop, but tPoolExecutor.getActiveCount() is potentially coming out as 0 before the workerThread is executed, and then the countThreadMap.size() is happening after the threads were added to your HashMap.
There are a number of ways to fix this - but I will let another answer-er do that because I have to leave at the moment.

AtomicReference to a mutable object and visibility

Say I have an AtomicReferenceto a list of objects:
AtomicReference<List<?>> batch = new AtomicReference<List<Object>>(new ArrayList<Object>());
Thread A adds elements to this list: batch.get().add(o);
Later, thread B takes the list and, for example, stores it in a DB: insertBatch(batch.get());
Do I have to do additional synchronization when writing (Thread A) and reading (Thread B) to ensure thread B sees the list the way A left it, or is this taken care of by the AtomicReference?
In other words: if I have an AtomicReference to a mutable object, and one thread changes that object, do other threads see this change immediately?
Edit:
Maybe some example code is in order:
public void process(Reader in) throws IOException {
List<Future<AtomicReference<List<Object>>>> tasks = new ArrayList<Future<AtomicReference<List<Object>>>>();
ExecutorService exec = Executors.newFixedThreadPool(4);
for (int i = 0; i < 4; ++i) {
tasks.add(exec.submit(new Callable<AtomicReference<List<Object>>>() {
#Override public AtomicReference<List<Object>> call() throws IOException {
final AtomicReference<List<Object>> batch = new AtomicReference<List<Object>>(new ArrayList<Object>(batchSize));
Processor.this.parser.parse(in, new Parser.Handler() {
#Override public void onNewObject(Object event) {
batch.get().add(event);
if (batch.get().size() >= batchSize) {
dao.insertBatch(batch.getAndSet(new ArrayList<Object>(batchSize)));
}
}
});
return batch;
}
}));
}
List<Object> remainingBatches = new ArrayList<Object>();
for (Future<AtomicReference<List<Object>>> task : tasks) {
try {
AtomicReference<List<Object>> remainingBatch = task.get();
remainingBatches.addAll(remainingBatch.get());
} catch (ExecutionException e) {
Throwable cause = e.getCause();
if (cause instanceof IOException) {
throw (IOException)cause;
}
throw (RuntimeException)cause;
}
}
// these haven't been flushed yet by the worker threads
if (!remainingBatches.isEmpty()) {
dao.insertBatch(remainingBatches);
}
}
What happens here is that I create four worker threads to parse some text (this is the Reader in parameter to the process() method). Each worker saves the lines it has parsed in a batch, and flushes the batch when it is full (dao.insertBatch(batch.getAndSet(new ArrayList<Object>(batchSize)));).
Since the number of lines in the text isn't a multiple of the batch size, the last objects end up in a batch that isn't flushed, since it's not full. These remaining batches are therefore inserted by the main thread.
I use AtomicReference.getAndSet() to replace the full batch with an empty one. It this program correct with regards to threading?
Um... it doesn't really work like this. AtomicReference guarantees that the reference itself is visible across threads i.e. if you assign it a different reference than the original one the update will be visible. It makes no guarantees about the actual contents of the object that reference is pointing to.
Therefore, read/write operations on the list contents require separate synchronization.
Edit: So, judging from your updated code and the comment you posted, setting the local reference to volatile is sufficient to ensure visibility.
I think that, forgetting all the code here, you exact question is this:
Do I have to do additional synchronization when writing (Thread A) and
reading (Thread B) to ensure thread B sees the list the way A left it,
or is this taken care of by the AtomicReference?
So, the exact response to that is: YES, atomic take care of visibility. And it is not my opinion but the JDK documentation one:
The memory effects for accesses and updates of atomics generally follow the rules for volatiles, as stated in The Java Language Specification, Third Edition (17.4 Memory Model).
I hope this helps.
Adding to Tudor's answer: You will have to make the ArrayList itself threadsafe or - depending on your requirements - even larger code blocks.
If you can get away with a threadsafe ArrayList you can "decorate" it like this:
batch = java.util.Collections.synchronizedList(new ArrayList<Object>());
But keep in mind: Even "simple" constructs like this are not threadsafe with this:
Object o = batch.get(batch.size()-1);
The AtomicReference will only help you with the reference to the list, it will not do anything to the list itself. More particularly, in your scenario, you will almost certainly run into problems when the system is under load where the consumer has taken the list while the producer is adding an item to it.
This sound to me like you should be using a BlockingQueue. You can then Limit the memory footprint if you producer is faster than your consumer and let the queue handle all contention.
Something like:
ArrayBlockingQueue<Object> queue = new ArrayBlockingQueue<Object> (50);
// ... Producer
queue.put(o);
// ... Consumer
List<Object> queueContents = new ArrayList<Object> ();
// Grab everything waiting in the queue in one chunk. Should never be more than 50 items.
queue.drainTo(queueContents);
Added
Thanks to #Tudor for pointing out the architecture you are using. ... I have to admit it is rather strange. You don't really need AtomicReference at all as far as I can see. Each thread owns its own ArrayList until it is passed on to dao at which point it is replaced so there is no contention at all anywhere.
I am a little concerned about you creating four parser on a single Reader. I hope you have some way of ensuring each parser does not affect the others.
I personally would use some form of producer-consumer pattern as I have described in the code above. Something like this perhaps.
static final int PROCESSES = 4;
static final int batchSize = 10;
public void process(Reader in) throws IOException, InterruptedException {
final List<Future<Void>> tasks = new ArrayList<Future<Void>>();
ExecutorService exec = Executors.newFixedThreadPool(PROCESSES);
// Queue of objects.
final ArrayBlockingQueue<Object> queue = new ArrayBlockingQueue<Object> (batchSize * 2);
// The final object to post.
final Object FINISHED = new Object();
// Start the producers.
for (int i = 0; i < PROCESSES; i++) {
tasks.add(exec.submit(new Callable<Void>() {
#Override
public Void call() throws IOException {
Processor.this.parser.parse(in, new Parser.Handler() {
#Override
public void onNewObject(Object event) {
queue.add(event);
}
});
// Post a finished down the queue.
queue.add(FINISHED);
return null;
}
}));
}
// Start the consumer.
tasks.add(exec.submit(new Callable<Void>() {
#Override
public Void call() throws IOException {
List<Object> batch = new ArrayList<Object>(batchSize);
int finishedCount = 0;
// Until all threads finished.
while ( finishedCount < PROCESSES ) {
Object o = queue.take();
if ( o != FINISHED ) {
// Batch them up.
batch.add(o);
if ( batch.size() >= batchSize ) {
dao.insertBatch(batch);
// If insertBatch takes a copy we could merely clear it.
batch = new ArrayList<Object>(batchSize);
}
} else {
// Count the finishes.
finishedCount += 1;
}
}
// Finished! Post any incopmplete batch.
if ( batch.size() > 0 ) {
dao.insertBatch(batch);
}
return null;
}
}));
// Wait for everything to finish.
exec.shutdown();
// Wait until all is done.
boolean finished = false;
do {
try {
// Wait up to 1 second for termination.
finished = exec.awaitTermination(1, TimeUnit.SECONDS);
} catch (InterruptedException ex) {
}
} while (!finished);
}

Categories

Resources