public void fooAndBar() {
HashMap<Foo, Bar> fooBarMap = new HashMap<>();
CompletionService completionService = new ExecutorCompletionService(exec);
for(int i=0; i<10; i++) {
completionService.submit(new Callable() {
#Override
public Void call() throws Exception {
fooBarMap.put(new Foo(i), new Bar(i));
return null;
}
});
}
}
Is it safe to modify the HashMap inside the Callable?
Should the hashmap be final (or maybe volatile) and if so, why?
Should I use a structure other than HashMap, something like ConcurrentHashMap or SynchronizedMap and why?
I'm trying to grasp java concepts so please bear with me
Is it safe to modify the HashMap inside the Callable?
No. If you are using a threadpool I assume you are planning to have more of those callables running in parallel. Any time an object with mutable state is accessed from more than one thread, that's thread-unsafe. If you write to a thread-unsafe hashmap from two threads simultaneously, its internal structure will be corrupted. If you read from a thread-unsafe hashmap while another thread is writing to it simultaneously, your reading thread will read garbage. This is a very well known and extensively studied situation known as a Race Condition, a description of which would be totally beyond the scope of this answer. For more information, read about Race Condition on Wikipedia or on another question answered back in 2008: Stackoverflow - What is a Race Condition.
Should the hashmap be final (or maybe volatile) and if so, why?
For your purposes it does not need to be final, but it is always a good practice to make final anything that can be made final.
It does not need to be volatile because:
if you were to make it volatile, you would be making the reference to it volatile, but the reference never changes, it is its contents that change, and volatile has nothing to do with those.
the threadpool makes sure that call() will be executed after fooBarMap = new HashMap<>(). (If you are wondering why such a thing could ever be a concern, google for "memory boundary".)
Should I use a structure other than HashMap, something like ConcurrentHashMap or SynchronizedMap and why?
Definitely. Because, as I wrote earlier, any time an object with mutable state is accessed from more than one thread, that's thread-unsafe. And ConcurrentHashMap, SynchronizedMap, synchronize, etc. exist precisely for taking care of thread-unsafe situations.
Hashmap should not be final, as you are modifying it multiple times(from within a for loop).
If you make it final, you may get an error.
Related
I tried to search but couldn't find exact answer I was looking for hence putting up a new question.
If you wish to share any mutable object(s) between multiple threads, are there any best practices/principles/guidelines to do it ?
Or will it simply vary case by case ?
Sharing mutable objects between threads is risky.
The safest way is to make the objects immutable, you can then share them freely.
If they must be mutable then each of the objects each needs to ensure their own thread safety using the usual methods to do so. (synchronized, AtomicX classes, etc).
The ways to protect the individual objects will vary a lot though depending on how you are using them and what you are using them for.
In java, you should synchronize any method that changes/reads the state of shared object, it is the easiest way.
other strategies are:
make use of thread safe classes (ConcurrentHashMap) for example
use of locks
use of volatile keyword, to avoid stale objects (sometimes could be used as lightweight synchronizer)
they key is sync your updates/reads to guarantee consistent state, the way you do it, could vary a lot.
The problems with sharing objects between threads are caused by having the two threads access the same data structure at the same time, with one mutating the structure while the other depends on the structure to be complete, correct or stable. Which of these cause the problem is important and should be considered when choosing the strategy.
These are the strategies I use.
Use immutable objects as much as possible.
This removes the issue of changing the data structure altogether. There are however a lot of useful patterns that can not be written using this approach. Also unless you are using a language/api which promotes immutability it can be inefficient. Adding a entry to a Scala list is much faster than making a copy of a Java list and adding a entry to the copy.
Use the synchronize keyword.
This ensures that only one thread at a time is allowed to change the object. It is important to choose which object to synchronize on. Changing a part of a structure might put the hole structure in an illegal state until another change is made. Also synchronize removes many of the benefits of going multithreaded in the first place.
The Actor model.
The actor model organizes the world in actors sending immutable messages to each other. Each actor only has one thread at once. The actor can contain the mutability.
There are platforms, like Akka, which provide the fundamentals for this approach.
Use the atomic classes. (java.util.concurrent.atomic)
These gems have methods like incrementAndGet. They can be used
to achieve many of the effects of synchronized without the overhead.
Use concurrent data structures.
The Java api contains concurrent data structures created for this purpose.
Risk doing stuff twice.
When designing a cache it is often a good idea to risk doing the work twice instead of using synchronize. Say you have a cache of compiled expressions from a dsl. If an expression is compiled twice that is ok as long as it eventually ends up in the cache. By allowing doing some extra work during initialization you may not need to use the synchronize keyword during cache access.
There is example. StringBuilder is not thread safe, so without synchronized (builder) blocks - result will be broken. Try and see.
Some objects are thread safe (for example StringBuffer), so no need to use synchronized blocks with them.
public static void main(String[] args) throws InterruptedException {
StringBuilder builder = new StringBuilder("");
Thread one = new Thread() {
public void run() {
for (int i = 0; i < 1000; i++) {
//synchronized (builder) {
builder.append("thread one\n");
//}
}
}
};
Thread two = new Thread() {
public void run() {
for (int i = 0; i < 1000; i++) {
//synchronized (builder) {
builder.append("thread two\n");
//}
}
}
};
one.start();
two.start();
one.join();
two.join();
System.out.println(builder);
}
Although there are some good answers already posted, but here is what I found while reading Java Concurrency in Practice Chapter 3 - Sharing Objects.
Quote from the book.
The publication requirements for an object depend on its mutability:
Mutable objects can be published through any mechanism;
Effectively immutable objects (whose state will not be modified after publication) must be safely published;
Mutable objects must be safely published, and must be either threadsafe or guarded by a lock.
Book states ways to safely publish mutable objects:
To publish an object safely, both the reference to the object and the object's state must be made visible to other threads at the same time. A properly constructed object can be safely published by:
Initializing an object reference from a static initializer;
Storing a reference to it into a volatile field or AtomicReference;
Storing a reference to it into a final field of a properly constructed object; or
Storing a reference to it into a field that is properly guarded by a lock.
The last point refers to using various mechanisms like using concurrent data structures and/or using synchronize keyword.
I found the following code snippet in luaj and I started to doubt that if there is a possibility that changes made to the Map after it has been constructed might not be visible to other threads since there is no synchronization in place.
I know that since the Map is declared final, its initialized values after construction is visible to other threads, but what about changes that happen after that.
Some might also realize that this class is so not thread-safe that calling coerce in a multi-threaded environment might even cause infinite loop in the HashMap, but my question is not about that.
public class CoerceJavaToLua {
static final Map COERCIONS = new HashMap(); // this map is visible to all threads after construction, since its final
public static LuaValue coerce(Object paramObject) {
...;
if (localCoercion == null) {
localCoercion = ...;
COERCIONS.put(localClass, localCoercion); // visible?
}
return ...;
}
...
}
You're correct that changes to the Map may not be visible to other threads. Every method that accesses COERCIONS (both reading and writing) should be synchronized on the same object. Alternatively, if you never need sequences of accesses to be atomic, you could use a synchronized collection.
(BTW, why are you using raw types?)
This code is actually bad and may cause many problems (probably not infinite loop, that's more common with TreeMap, with HashMap it's more likely to get the silent data loss due to overwrite or probably some random exception). And you're right, it's not guaranteed that the changes made in one thread will be visible by another one.
Here the problem may look not very big as this Map is used for caching purposes, thus silent overwrites or visibility lag doesn't lead to real problems (just two distinct instances of coersion will be used for the same class, which is probably ok in this case). However it's still possible that such code will break your program. If you like, you can submit a patch to LuaJ team.
Two options:
// Synchronized (since Java 1.2)
static final Map COERCIONS = Collections.synchronizedMap(new HashMap());
// Concurrent (since Java 5)
static final Map COERCIONS = new ConcurrentHashMap();
They each have their pros and cons.
ConcurrentHashMap pro is no locking. Con is that operations are not atomic, e.g. an Iterator in one thread and a call to putAll in another will allow iterator to see some of the values added.
there was already a question whether threads can simultaneously safely read/iterate LinkeList. It seems the answer is yes as far as no-one structurally changes it (add/delete) from the linked list.
Although one answer was warning about "unflushed cache" and advicing to know "java memory model". So I'm asking to elaborate those "evil" caches. I'm a newbie and so far I still naively believe that following code is ok (at least from my tests)
public static class workerThread implements Runnable {
LinkedList<Integer> ll_only_for_read;
PrintWriter writer;
public workerThread(LinkedList<Integer> ll,int id2) throws Exception {
ll_only_for_read = ll;
writer = new PrintWriter("file."+id2, "UTF-8");
}
#Override
public void run() {
for(Integer i : ll_only_for_read) writer.println(" ll:"+i);
writer.close();
}
}
public static void main(String args[]) throws Exception{
LinkedList<Integer> ll = new LinkedList<Integer>();
for(int i=0;i<1e3;i++) ll.add(i);
// do I need to call something special here? (in order to say:
// "hey LinkeList flush all your data from local cache
// you will be now a good boy and share those data among
// whole lot of interesting threads. Don't worry though they will only read
// you, no thread would dare to change you"
new Thread(new workerThread(ll,1)).start();
new Thread(new workerThread(ll,2)).start();
}
Yes, in your specific example code it's okay, since the act of creating the new thread should define a happens-before relationship between populating the list and reading it from another thread." There are plenty of ways that a seemingly-similar set up could be unsafe, however.
I highly recommend reading "Java Concurrency in Practice" by Brian Goetz et al for more details.
If your code created and populated the list with a single thread and only in a second moment you create other threads that concurrently access the list there is no problem.
Only when a thread can modify a value while other threads try to read the same value can happens problems.
It can be a problem if you change the object you retrieve (also if you don't change the list itself).
Although one answer was warning about "unflushed cache" and advicing to know "java memory model".
I think you are referring to my Answer to this Question: Can Java LinkedList be read in multiple-threads safely?.
So I'm asking to elaborate those "evil" caches.
They are not evil. They are just a fact of life ... and they affect the correctness (thread-safety) reasoning for multi-threaded applications.
The Java Memory Model is Java's answer to this fact of life. The memory model specifies with mathematical precision a bunch of rules that need to be obeyed to ensure that all possible executions of your application are "well-formed". (In simple terms: that your application is thread-safe.)
The Java Memory Model is ... difficult.
Someone recommended "Java Concurrency in Practice" by Brian Goetz et al. I concur. It is the best textbook on the topic of writing "classic" Java multi-threaded applications, and it has a good explanation of the Java Memory Model.
More importantly, Goetz et al gives you a simpler set of rules that are sufficient to give you thread-safety. These rules are still too detailed to condense into StackOverflow answer ... but
one of the concepts is "safe publication", and
one of the principles is to use / re-use existing concurrency constructs rather than to roll your own concurrency mechanisms based on the Memory Model.
I'm a newbie and so far I still naively believe that following code is ok.
It >>is<< correct. However ...
(at least from my tests)
... testing is NOT a guarantee of anything. The problem with non-thread-safe programs is that the faults are frequently not revealed by testing because they manifest randomly, with low probability, and often differently on different platforms.
You cannot rely on testing to tell you that your code is thread-safe. You need to reason1 about the behaviour ... or follow a set of well-founded rules.
1 - And I mean real, well-founded reasoning ... not seat-of-the-pants intuitive stuff.
The way you're using it is fine, but only by coincidence.
Programs are rarely that trivial:
If the List contains references to other (mutable) data, then you'll get race conditions.
If someone modifies your 'reader' threads later in the code's lifecycle, then you'll get races.
Immutable data (and data structures) are by definition thread-safe. However, this is a mutable List, even though you're making the agreement with yourself that you won't modify it.
I'd recommend wrapping the List<> instance like this so the code fails immediately if someone tries to use any mutators on the List:
List<Integer> immutableList = Collections.unmodifiableList(ll);
//...pass 'immutableList' to threads.
Link to unmodifiableList
You need to guarantee happens-before relationship between reads and writes in your LinkedList because they are done in separate threads.
Result of ll.add(i) will be visible for new workerThread because Thread.start forms happens-before relationship. So your example is thread safe. See more about happens-before conditions.
However be aware of more complex situation, when LinkedList is read during iteration in worker threads and at the same time it is modified by the main thread. Like this:
for(int i=0;i<1e3;i++) {
ll.add(i);
new Thread(new workerThread(ll,1)).start();
new Thread(new workerThread(ll,2)).start();
}
This way ConcurrentModificationException is possible.
There are several options:
Clone your LinkedList inside of workerThread and iterate the copy
instead.
Use synchronization both for list modification and for list
iteration (but it will lead to poor concurrency).
Instead of LinkedList use CopyOnWriteArrayList.
Sorry for answering to my question. But I was thinking of your reassuring answers and I found it may not be so safe as it seems. I found and tested case when it is not working - if object would use it's class variable for storing any data (I wouldn't know about) then it would fail (then the only question is if linked list (and other java classes) in some implementation can do it...) See failing example:
public class DummyLinkedList {
public LinkedList<Integer> ll;
public DummyLinkedList(){
ll = new LinkedList<Integer>();
}
int lastGetIndex;
int myDummyGet(int idx){
lastGetIndex = idx;
//return ll.get(idx); // thids would work fine as parameter is on the stack so uniq for each call (at least if java supports reentrant functions)
return ll.get(lastGetIndex); // this would make a problem even for only readin the object - question is how many such issues java.* contains
}
}
It depends on how the object was created and made available to your thread. In general, no, it's not safe, even if the object isn't modified.
Following are some ways to make it safe.
First, create the object and perform any modification that is necessary; you can consider the object to be effectively immutable if no more modifications occur. Then, share the effectively immutable object with other threads by one of the following means:
Have other threads read the object from a field that is volatile.
Write a reference to the object inside a synchronized block, then have other threads read that reference while synchronized on the same lock.
Start the reading threads after the object is initialized, passing the object as a parameter. (This is what you are doing in your example, so you are safe.)
Pass the object between threads using a concurrent mechanism like a BlockingQueue implementation, or publish it in a concurrent collection, like a ConcurrentMap implementation.
There might be others. Alternatively, you can make all of the fields of the shared object final (including all the fields of its Object members, and so on). Then it will be safe to share this object by any means across threads. That's one of the under-appreciated virtues of immutable types.
If you only access to the list is by 'read' methods (including iterations) then you are fine. Like in your code.
Can anyone explain to me how the parameter map will be affected in the following code if two threads access it at the same time. Is the map exposed to thread safety issues because it is not inside the synchronized block?
public void fun(String type, String name, Map<String, Object> parameters) {
parameters.put(Constants.PARM_TYPE, type);
parameters.put(Constants.PARM_NAME, name);
try {
synchronized (launcher) {
launcher.launch(type, bool, parameters);
}
} catch (Exception e) {
logger.error("AHHHHH, the world has ended!",e);
}
}
I have looked at the following but I'm still questioning it: Synchronized and the scope of visibility
If your parameters instances are separate (as you mentioned in your last comment), then there is no problem with this code.
The method parameters - besides Map parameters - are just 2 Strings, so there are no synchronisation issues regarding them.
To put the synchronized block onto method level or on launcher: They're different objects. If you put on method, it will synchronize on this, otherwise on launcher. Since you want to protect the 'launcher', you have to "build the fence" as close as you can - so synchronizing on launcher is OK.
There is another technique which is using a Object lockObject = new Object(), and does the synchronization on that object, but for this purpuse I think it's overkill, but you can do that.
Imagine if you had a shared Map.
private Map<String, Object> map = new HashMap<String,Object>();
that is being updated by many threads as displayed in your example.
new Thread(new Runnable(){
public void run(){
fun("a","b", map);
}
}).start();
new Thread(new Runnable(){
public void run(){
fun("a","b", map);
}
}).start();
Each thread may update the map at the same time which could lead to A Beautiful Race Condition
If multiple threads have a handle to the same parameters instance and they call this method (which modifies the map) with a non-thread-safe map implementation, all kinds of bad things can/will happen (e.g. map corruption which may/may not manifest itself as exceptions like NullPointerException).
Assuming multiple threads are accessing the method fun(), the way map works is if you insert the same key multiple times then the value of that key would be overridden each time. But this might not be the only problem. There could be race conditions and corruption issues too. If you want an implicitly thread safe data structure, I assume a HashTable will get your job done.
if more than one thread executes that code concurrently passing the same object as the parameter map then you will have a race condition.
This will definitely cause thread safety issues unless you:
use the right Map implementation, based on your requirements and the Map implementation concurrent behavior (ConcurrentHashMap for instance, but this depends a lot on the actual requirements for your app)
or write thread safe code yourself (probably using synchronization primitives like 'synchronized').
IMPORTANT: Please notice that just moving the lines of code that modify the map into the synchronized block won't necessarily remove the race condition as you'll have to consider which other threads in your app may try to modify the map and which object they will use to synchronize their access to it. The code in the function is using a reference to 'launcher' to synchronize. Any other thread modifying the map without synchronization or with synchronization over an object different than 'launcher' will cause a race condition
I recently saw a piece of code which used a ThreadLocal object and kept a ConcurrentHashMap within it.
Is there any logic/benefit in this, or is it redundant?
If the only reference to the concurrent hashmap resides in the ThreadLocal, the hashmap is obviously only referenced from a single thread. In such case I would say it is completely redundant.
However, it's not hard to imagine someone "sharing" the thread-locally stored hashmap with other threads:
ThreadLocal<ConcurrentHashMap<String, String>> tl = ...
// ...
final ConcurrentHashMap<String, String> props = tl.get();
EventQueue.invokeLater(new Runnable() {
public void run() {
props.add(key.getText(), val.getText());
}
});
Either he used ThreadLocal wrongly, or ConcurrentHashMap wrongly. The likelihood that the combination makes sense is close to 0.
In addition to what #aioobe said, consider the case of InheritableThreadLocal, in which the value of local is passed from a thread to each child thread that it creates.
And as #pst says, there is nothing to prevent the same value being used in different (non-inheritable) ThreadLocals.
In short, you have to do a thorough analysis of the thread locals, the way that they are initialized and the way that they are used before you can safely conclude that they don't need to be threadsafe.