I have a class which reads an xml file and populates them in a private static data-structure(say, HashMap). This initial population happens in a static block. Then I have method to get value of a given key, which intern refers that static HashMap. Cosider the case, when multiple threads tries to get value for a given key, will there be any performance hit; like, when one thread is reading that static object other threads has to wait.
public class Parser
{
private static HashMap resource = new HashMap();
static
{
parseResource();
}
private Parser()
{
}
private static parseResource()
{
//parses the resource and populates the resource object
}
public static Object getValue( String key)
{
//may be some check will be done here, but not any
//update/modification actions
return resource.get(key);
}
}
Firstly, it's worth being aware that this really has very little to do with static. There's no such thing as a "static object" - there are just objects, and there are fields and methods which may or may not be static. For example, there could be an instance field and a static field which both refer to the same object.
In terms of thread safety, you need to consider the safety of the operations you're interested in on a single object - it doesn't matter how the multiple threads have "reached" that object.
like, when one thread is reading that static object other threads has to wait.
No, it doesn't.
If you are just reading from the HashMap after constructing it in a way that prevented it from being visible to other threads until it had been finished, that's fine. (Having reread your comment, it looks like that's the case in getValue.)
If you need to perform any mutations on the map while other threads are reading from it, consider using ConcurrentHashMap or use synchronization.
From the docs for HashMap:
Note that this implementation is not synchronized. If multiple threads access a hash map concurrently, and at least one of the threads modifies the map structurally, it must be synchronized externally.
You have no locking happening in your example code, so there's no way that multiple threads would need to wait.
Just adding to Jon Skeet's answer, for this kind of use you might want to consider Guava's ImmutableMap, which enforces immutability.
Just use the synchronized keyword and everything should work fine.
Related
I have a static HashMap which will cache objects identifed by unique integers; it will be accessed from multiple threads. I will have multiple instances of the type HashmapUser running in different threads, each of which will want to utilize the same HashMap (which is why it's static).
Generally, the HashmapUsers will be retrieving from the HashMap. Though if it is empty, it needs to be populated from a Database. Also, in some cases the HashMap will be cleared because it needs the data has change and it needs to be repopulated.
So, I just make all interactions with the Map syncrhonized. But I'm not positive that this is safe, smart, or that it works for a static variable.
Is the below implementation of this thread safe? Any suggestions to simplify or otherwise improve it?
public class HashmapUser {
private static HashMap<Integer, AType> theMap = new HashSet<>();
public HashmapUser() {
//....
}
public void performTask(boolean needsRefresh, Integer id) {
//....
AType x = getAtype(needsRefresh, id);
//....
}
private synchronized AType getAtype(boolean needsRefresh, Integer id) {
if (needsRefresh) {
theMap.clear();
}
if (theMap.size() == 0) {
// populate the set
}
return theMap.get(id);
}
}
As it is, it is definitely not thread-safe. Each instance of HashmapUsers will use a different lock (this), which does nothing useful. You have to synchronise on the same object, such as the HashMap itself.
Change getAtype to:
private AType getAtype(boolean needsRefresh, Integer id) {
synchronized(theMap) {
if (needsRefresh) {
theMap.clear();
}
if (theMap.size() == 0) {
// populate the set
}
return theMap.get(id);
}
}
Edit:
Note that you can synchronize on any object, provided that all instances use the same object for synchronization. You could synchronize on HashmapUsers.class, which also allows for other objects to lock access to the map (though it is typically best practice to use a private lock).
Because of this, simply making your getAtype method static would work, since the implied lock would now be HashMapUsers.class instead of this. However, this exposes your lock, which may or may not be what you want.
No, this won't work at all.
If you don't specify lock object, e.g. declare method synchronized, the implicit lock will be instance. Unless the method is static then the lock will be class. Since there are multiple instances, there are also multiple locks, which i doubt is desired.
What you should do is create another class which is the only class with the access to HashMap.
Clients of HashMap, such as the HashMapUser must not even be aware that there is synchronization in place. Instead, thread safety should be assured by the proper class wrapping the HashMap hiding the synchronization from the clients.
This lets you easily add additional clients to the HashMap since synchronization is hidden from them, otherwise you would have to add some kind of synchronization between the different client types too.
I would suggest you go with either ConcurrentHashMap or SynchronizedMap.
More info here: http://crunchify.com/hashmap-vs-concurrenthashmap-vs-synchronizedmap-how-a-hashmap-can-be-synchronized-in-java/
ConcurrentHashMap is more suitable for high - concurrency scenarios. This implementation doesn't synchronize on the whole object, but rather does that in an optimised way, so different threads, accessing different keys can do that simultaneously.
SynchronizerMap is simpler and does synchronization on the object level - the access to the instance is serial.
I think you need performance, so I think you should probably go with ConcurrentHashMap.
I have several threads trying to increment a counter for a certain key in a not thread-safe custom data structure (which you can image to be similiar to a HashMap). I was wondering what the right way to increment the counter in this case would be.
Is it sufficient to synchronize the increment function or do I also need to synchronize the get operation?
public class Example {
private MyDataStructure<Key, Integer> datastructure = new CustomDataStructure<Key, Integer>();
private class MyThread implements Runnable() {
private synchronized void incrementCnt(Key key) {
// from the datastructure documentation: if a value already exists for the given key, the
// previous value will be replaced by this value
datastructure.put(key, getCnt(key)+1);
// or can I do it without using the getCnt() function? like this:
datastructure.put(key, datastructure.get(key)+1));
}
private synchronized int getCnt(Key key) {
return datastructure.get(key);
}
// run method...
}
}
If I have two threads t1, t2 for example, I would to something like:
t1.incrementCnt();
t2.incrmentCnt();
Can this lead to any kind of deadlock? Is there a better way to solve this?
Main issue with this code is that it's likely to fail in providing synchronization access to datastructure, since accessing code synchronizing on this of an inner class. Which is different for different instances of MyThread, so no mutual exclusion will happen.
More correct way is to make datastructure a final field, and then to synchronize on it:
private final MyDataStructure<Key, Integer> datastructure = new CustomDataStructure<Key, Integer>();
private class MyThread implements Runnable() {
private void incrementCnt(Key key) {
synchronized (datastructure) {
// or can I do it without using the getCnt() function? like this:
datastructure.put(key, datastructure.get(key)+1));
}
}
As long as all data access is done using synchronized (datastructure), code is thread-safe and it's safe to just use datastructure.get(...). There should be no dead-locks, since deadlocks can occur only when there's more than one lock to compete for.
As the other answer told you, you should synchronize on your data structure, rather than on the thread/runnable object. It is a common mistake to try to use synchronized methods in the thread or runnable object. Synchronization locks are instance-based, not class-based (unless the method is static), and when you are running multiple threads, this means that there are actually multiple thread instances.
It's less clear-cut about Runnables: you could be using a single instance of your Runnable class with several threads. So in principle you could synchronize on it. But I still think it's bad form because in the future you may want to create more than one instance of it, and get a really nasty bug.
So the general best practice is to synchronize on the actual item that you are accessing.
Furthermore, the design conundrum of whether or not to use two methods should be solved by moving the whole thing into the data structure itself, if you can do so (if the class source is under your control). This is an operation that is confined to the data structure and applies only to it, and doing the increment outside of it is not good encapsulation. If your data structure exposes a synchronized incrementCnt method, then:
It synchronizes on itself, which is what you wanted.
It can use its own private fields directly, which means you don't actually need to call a getter and a setter.
It is free to have the implementation changed to one of the atomic structures in the future if it becomes possible, or add other implementation details (such as logging increment operations separately from setter access operations).
assume we have 2 threads, thread A and thread B.
thread A is the main method and contain large data structures.
is it possible to create a second thread and pass the address(a pointer) of the data structure (local to thread A) to thread B so both thread can read from the data structure?
the point of this is to avoid the need to duplicate the entire data structure on thread B or spend a lot of time pulling relevant information from the data structure for thread B to use
keep in mind that neither thread is modifying the data
In Java, the term pointer is not used, but reference.
It is possible to pass it, as any other object, to another thread.
As any (non-final) class in Java, you can extend it, add members, add constructors etc.
(If you need to modify the data) You need to make sure that there are no concurrency issues.
It's known as a reference in java, as you don't have access directly to a pointer in a conventional sense. (For most cases it's "safe" to think of it as every reference is a pointer that is always passed by value and the only legal operation is to dereference it. It is NOT the same as a C++ 'reference.')
You can certainly share references among threads. Anything that's on the heap can be seen and used by any thread that can get a reference to it. You can either put it in a static location, or set the value of a reference on your Runnable to point to the data.
public class SharedDataTest {
private static class SomeWork implements Runnable {
private Map<String, String> dataTable;
public SomeWork(Map<String, String> dataTable) {
this.dataTable = dataTable;
}
#Override
public void run() {
//do some stuff with dataTable
}
}
public static void main(String[] args) {
Map<String, String> dataTable = new ConcurrentHashMap<String, String>();
Runnable work1 = new SomeWork(dataTable);
Runnable work2 = new SomeWork(dataTable);
new Thread(work1).start();
new Thread(work2).start();
}
}
Yes it is possible and is a usual thing to do but you need to make sure that you use proper synchronization to ensure that both threads see an up to date version of the data.
It is safe to share a reference to immutable object. Roughly speaking, immutable object is the object that doesn't change its state after construction. Semantically immutable object should contain only final fields which in turn reference immutable objects.
If you want to share reference to mutable object you need to use proper synchronization, for example by using synchronized or volatile keywords.
Easy way to share data safely would be to use utilities from java.util.concurrent package such as AtomicReference or ConcurrentHashMap, however you still have to be very careful if objects you share are mutable.
If you are not doing any modification in the shared data you can have a shared reference and there will be no significant overhead.
Be careful however when you start modifying the shared object concurrently, in this case you can use the data structures provided in java (see for instance factory methods in Collections), or use a custom synchronisation scheme, for instance with java.util.concurrent.locks.ReentrantLock.
I'm writing an analogue of DatabaseConfiguration class which reads configuration from database and I need some advice regards synchronization.
For example,
public class MyDBConfiguration{
private Connection cn;
private String table_name;
private Map<String, String> key_values = new HashMap<String,String>();
public MyDBConfiguration (Connection cn, String table_name) {
this.cn = cn;
this.table_name = table_name;
reloadConfig();
}
public String getProperty(String key){
return this.key_values.get(key);
}
public void reloadConfig() {
Map<String, String> tmp_map = new HashMap<String,String> ();
// read data from database
synchronized(this.key_values)
{
this.key_values = tmp_map;
}
}
}
So I have a couple questions.
1. Assuming properties are read only , do I have use synchronize in getProperty ?
2. Does it make sense to do this.key_values = Collections.synchronizedMap(tmp_map) in reloadConfig?
Thank you.
If multiple threads are going to share an instance, you must use some kind of synchronization.
Synchronization is needed mainly for two reasons:
It can guarantee that some operations are atomic, so the system will keep consistent
It guarantees that every threads sees the same values in the memory
First of all, since you made reloadConfig() public, your object does not really look immutable. If the object is really immutable, that is, if after initialization of its values they cannot change (which is a desired property to have in objects that are shared).
For the above reason, you must synchronize all the access to the map: suppose a thread is trying to read from it while another thread is calling reloadConfig(). Bad things will happen.
If this is really the case (mutable settings), you must synchronize in both reads and writes (for obvious reasons). Threads must synchronize on a single object (otherwise there's no synchronization). The only way to guarantee that all the threads will synchronize on the same object is to synchronize on the object itself or in a properly published, shared, lock, like this:
// synchronizes on the in instance itself:
class MyDBConfig1 {
// ...
public synchronized String getProperty(...) { ... }
public synchronized reloadConfig() { ... }
}
// synchronizes on a properly published, shared lock:
class MyDBConfig2 {
private final Object lock = new Object();
public String getProperty(...) { synchronized(lock) { ... } }
public reloadConfig() { synchronized(lock) { ... } }
}
The properly publication here is guaranteed by the final keyword. It is subtle: it guarantees that the value of this field is visible to every thread after initialization (without it, a thread might see that lock == null, and bad things will happen).
You could improve the code above by using a (properly published) ReadWriteReentrantLock. It might improve concurrency a bit if that's a concern for you.
Supposing your intention was to make MyDBConfig immutable, you do not need to serialize access to the hash map (that is, you don't necessarily need to add the synchronized keyword). You might improve concurrency.
First of all, make reloadConfig() private (this will indicate that, for consumers of this object, it is indeed immutable: the only method they see is getProperty(...), which, by its name, should not modify the instance).
Then, you only need to guarantee that every thread will see the correct values in the hash map. To do so, you could use the same techniques presented above, or you could use a volatile field, like this:
class MyDBConfig {
private volatile boolean initialized = false;
public String getProperty(...) { if (initialized) { ... } else { throw ... } }
private void reloadConfig() { ...; initialized = true; }
public MyDBConfig(...) { ...; reloadConfig(); }
}
The volatile keyword is very subtle. Volatile writes and volatile reads have a happens-before relationship. A volatile write is said to happen-before a subsequent volatile read of the same (volatile) field. What this means is that all the memory locations that have been modified before (in program order) a volatile write are visible to every other thread after they have executed a subsequente volatile read of the same (volatile) field.
In the code above, you write true to the volatile field after all the values have been set. Then, the method reading values (getProperty(...)) begins by executing a volatile read of the same field. Then this method is guaranteed to see the correct values.
In the example above, if you don't publish the instance before the constructor finishes, it is guaranteed that the exception won't get thrown in the method getProperty(...) (because before the constructor finishes, you have written true to initialized).
Assuming that key_values will not be put to after reloadConfig you will need to synchronize access to both reads and writes of the map. You are violating this by only synchronizing on the assignment. You can solve this by removing the synchronized block and assigning the key_values as volatile.
Since the HashMap is effectively read only I wouldn't assign Collections.synchronizedMap rather Collections.unmodifiableMap (this wouldn't effect the Map itself, just prohibit from accidental puts from someone else possible using this class).
Note: Also, you should never synchronize a field that will change. The results are very unpredictable.
Edit: In regards to the other answers. It is highly suggested that all shared mutable data must be synchronized as the effects are non-deterministic. The key_values field is a shared mutable field and assignments to it must be synchronized.
Edit: And to clear up any confusion with Bruno Reis. The volatilefield would be legal if you still fill the tmp_map and after its finished being filled assign it to this.key_values it would look like:
private volatile Map<String, String> key_values = new HashMap<String,String>();
..rest of class
public void reloadConfig() {
Map<String, String> tmp_map = new HashMap<String,String> ();
// read data from database
this.key_values = tmp_map;
}
You still need the same style or else as Bruno Reis noted it would not be thread-safe.
I would say that if you guarantee that no code will structurally modify your map, then there is no need to synchronize it.
If multiple threads access a hash map concurrently, and at least one
of the threads modifies the map structurally, it must be synchronized
externally.
http://download.oracle.com/javase/6/docs/api/java/util/HashMap.html
The code you have shown provides only read access to the map. Client code cannot make a structural modification.
Since your reload method alters a temporary map and then changes key_values to point to the new map, again I'd say no synchronization is required. The worst that can happen is someone reads from old copy of the map.
I'm going to keep my head down and wait for the downvotes now ;)
EDIT
As suggested by Bruno, the fly in the ointment is inheritance. If you cannot guarantee that your class will not be sub-classed, then you should be more defensive.
EDIT
Just to refer back to the specific questions posed by the OP...
Assuming properties are read only , do I have use synchronize in getProperty ?
Does it make sense to do this.key_values = Collections.synchronizedMap(tmp_map) in reloadConfig?
... I am genuinely interested to know if my answers are wrong. So I won't give up and delete my answer for a while ;)
I have a singleton class, that has a map which can be accessed by multiple threads at the same time. Could somebody please check the code below and tell me if its thread safe?
(note: I dont plan to use ConcurrentHashMap, and the printMap method is called only seldom.)
public class MySingleton{
private Map<String,String> cache = Collections.synchronizedMap(
new LinkedHashMap<String,String>());
public String getValue(String key){
return cache.get(key)
}
public void setValue(String key, String value){
cache.put(key, value);
}
public void printMap(){
synchronized(cache){
for(Entry<String,String> entry: cache.entrySet()){
println('key: '+entry.getKey()+', value: ' + value);
}
}
}
}
My test is working... but i am doubting if this code is good enough to be called 'thread safe'.
points that I considered:
The readValue and putValue methods don't need to have a 'synchronized' block since i am using a synchronizedMap
printMap should have the synchronized block, since the javadoc for says that we should synchronize the Map instance before each iteration.
http://download.oracle.com/javase/1.5.0/docs/api/java/util/Collections.html#synchronizedMap%28java.util.Map%29
Any help is appreciated.
Yes, that's okay. The key thing is that while you're iterating, nothing will be able to modify the map because cache.put will end up synchronizing on cache anyway.
Personally I'd rather make that explicit, by using a "normal" hashmap and synchronizing on the same object (whether the map or something else) from all three methods - but what you've got should be fine.
(Alternatively, you could use ConcurrentHashMap to start with. It's worth at least looking at that.)
Yes it is thread safe. Each access to the cache is synchronized (by the synchronizedMap for get and set and by an explicit sync block for the printMap)
Yes, this class is thread-safe.
Though note that even a thread-safe class requires safe publication to be used really safely (without safe publication nothing guarantees that other threads can't see cache in non-initialized state, i.e. null).
But in this case you can eliminate a need in safe publication by making your class immutable (final keyword guarantees that other threads can't see null in cache):
private final Map<String,String> cache = Collections.synchronizedMap( new LinkedHashMap<String,String>());