I have a project which displays department documentation. I store all the docs (get from database) in a static arrayList. Every X hour, I have that arrayList rebuilt based on the new doc (if any) from the database. There is also a static variable to control to rebuild that array or not, set and unset in a method which does the rebuild task. Each web browser hit the server will create this class's instance, but the doc arrayList and that control variable is shared among all the class instances.
Find-Bugs tool complains that "Write to static field someArrayName and someVariableName from instance method someClassMethod". Seems this is not the good thing to do (let class instance method write to static field). Does anyone have good recommendation how to get around this problem ? Thanks.
Per the FindBugs bug descriptions:
ST: Write to static field from instance method (ST_WRITE_TO_STATIC_FROM_INSTANCE_METHOD)
This instance method writes to a static field. This is tricky to get correct if multiple instances are being manipulated, and generally bad practice.
Aside from the concurrency issues, it means that all of the instances in the JVM are accessing the same data, and would not allow two separate groups of instances. It would be better if you had a singleton "manager" object and passed it to each of the instances as a constructor parameter or at least as a setManager() method argument.
As for the concurrency issues: if you must use a static field, your static field should be final; explicit synchronization is difficult. (There are also some tricky aspects if you are initializing non-final static fields, beyond my knowledge of Java but I think I've seen them in the Java Puzzlers book.) There are at least three ways of dealing with this (warning, untested code follows, check first before using):
Use a thread-safe collection, e.g. Collections.synchronizedList wrapped around a list that is not accessed in any other way.
static final List<Item> items = createThreadSafeCollection();
static List<Item> createThreadSafeCollection()
{
return Collections.synchronizedList(new ArrayList());
}
and then later when you are replacing this collection, from an instance:
List<Item> newItems = getNewListFromSomewhere();
items.clear();
items.add(newItems);
The problem with this is that if two instances are doing this sequence at the same time, you could get:
Instance1: items.clear();
Instance2: items.clear();
Instance1: items.addAll(newItems);
Instance2: items.addAll(newItems);
and get a list that doesn't meet the desired class invariant, namely that you have two groups of newItems in the static list. So this method doesn't work if you are clearing the entire list as one step, and adding items as a second step. (If your instances just need to add an item, though, items.add(newItem) would be safe to use from each instance.)
Synchronize access to the collection.
You'll need an explicit mechanism for synchronizing here. Synchronized methods won't work because they synchronize on "this", which is not common between the instances. You could use:
static final private Object lock = new Object();
static volatile private List<Item> list;
// technically "list" doesn't need to be final if you
// make sure you synchronize properly around unit operations.
static void setList(List<Item> newList)
{
synchronized(lock)
{
list = newList;
}
}
use AtomicReference
static final private AtomicReference<List<Item>> list;
static void setList(List<Item> newList)
{
list.set(newList);
}
If I understand the message you posted from Find Bugs correctly, this is just a warning.
If you want to hide the warning, do the modifications from a static method. Find Bugs is warning you because this is typically an error. The programmer thinks they are changing some instance state but really they are changing some state which impacts every instance.
Using the Singleton design pattern is one way. You can have only one instance of an object that holds the value you want, and access that instance through a global property. The advantage is that, if you want to have more instances later on, there's less modification of preexisting code (since you're not changing static fields to instance fields).
You don't need to delete the list each time. As per above you will have to deal with multiple threads, but you can create the ArrayList once, then use clear() and addAll() methods to wipe and repopulate. FindBugs should be quite happy with that because you are not setting a static.
guys - feel free to chip in if there is any problem with this technique :-)
A second thought is to drive things from the database via hibernate. So don't maintain a list, hibernate has inbuilt caching so it's almost as quick. If you update the data at the database level (which means hibernate doesn't know) you can then tell hibernate to clear it's cache and refresh from the database when it's next queried.
You do not want to do this. Every request runs in its own thread. If the code that gets executed on a browser action modifies the list, then two requests can possibly modify the list at the same time, and corrupt the data. That is why it is not a good idea to access static resources from a non-static context, and probably why your tool is warning you.
Look at this
http://download.oracle.com/javase/6/docs/api/index.html?java/util/concurrent/package-summary.html
specifically the part about how the ArrayList is not synchronized. Also note that the paragraph I mention has a solution, specifically
List list = Collections.synchronizedList(new ArrayList(...));
Thats one way to do it. But its still not a good idea, namely because it can be slow. If its not a commercial-grade application, and you are not dealing in high volume, you can probably get by not making it better. If this is the type of app that only gets hit a few times per day, you can ignore the warning, with the understanding that its is possible that something bad will happen if two requests munge each other.
A better solution: Since you have database, I would just get the information from the db as you need it, i.e. as the requests come in. You can use some caching technologies for performance.
The reason I don't like the Singleton Pattern idea is that even if it makes the warning go away, it doesn't address the fundamental synchronization problem, by itself. There are thread safe http://en.wikipedia.org/wiki/Singleton_pattern#Traditional_simple_way_using_synchronization, however, which might work in this case.
Related
I found the following code snippet in luaj and I started to doubt that if there is a possibility that changes made to the Map after it has been constructed might not be visible to other threads since there is no synchronization in place.
I know that since the Map is declared final, its initialized values after construction is visible to other threads, but what about changes that happen after that.
Some might also realize that this class is so not thread-safe that calling coerce in a multi-threaded environment might even cause infinite loop in the HashMap, but my question is not about that.
public class CoerceJavaToLua {
static final Map COERCIONS = new HashMap(); // this map is visible to all threads after construction, since its final
public static LuaValue coerce(Object paramObject) {
...;
if (localCoercion == null) {
localCoercion = ...;
COERCIONS.put(localClass, localCoercion); // visible?
}
return ...;
}
...
}
You're correct that changes to the Map may not be visible to other threads. Every method that accesses COERCIONS (both reading and writing) should be synchronized on the same object. Alternatively, if you never need sequences of accesses to be atomic, you could use a synchronized collection.
(BTW, why are you using raw types?)
This code is actually bad and may cause many problems (probably not infinite loop, that's more common with TreeMap, with HashMap it's more likely to get the silent data loss due to overwrite or probably some random exception). And you're right, it's not guaranteed that the changes made in one thread will be visible by another one.
Here the problem may look not very big as this Map is used for caching purposes, thus silent overwrites or visibility lag doesn't lead to real problems (just two distinct instances of coersion will be used for the same class, which is probably ok in this case). However it's still possible that such code will break your program. If you like, you can submit a patch to LuaJ team.
Two options:
// Synchronized (since Java 1.2)
static final Map COERCIONS = Collections.synchronizedMap(new HashMap());
// Concurrent (since Java 5)
static final Map COERCIONS = new ConcurrentHashMap();
They each have their pros and cons.
ConcurrentHashMap pro is no locking. Con is that operations are not atomic, e.g. an Iterator in one thread and a call to putAll in another will allow iterator to see some of the values added.
there was already a question whether threads can simultaneously safely read/iterate LinkeList. It seems the answer is yes as far as no-one structurally changes it (add/delete) from the linked list.
Although one answer was warning about "unflushed cache" and advicing to know "java memory model". So I'm asking to elaborate those "evil" caches. I'm a newbie and so far I still naively believe that following code is ok (at least from my tests)
public static class workerThread implements Runnable {
LinkedList<Integer> ll_only_for_read;
PrintWriter writer;
public workerThread(LinkedList<Integer> ll,int id2) throws Exception {
ll_only_for_read = ll;
writer = new PrintWriter("file."+id2, "UTF-8");
}
#Override
public void run() {
for(Integer i : ll_only_for_read) writer.println(" ll:"+i);
writer.close();
}
}
public static void main(String args[]) throws Exception{
LinkedList<Integer> ll = new LinkedList<Integer>();
for(int i=0;i<1e3;i++) ll.add(i);
// do I need to call something special here? (in order to say:
// "hey LinkeList flush all your data from local cache
// you will be now a good boy and share those data among
// whole lot of interesting threads. Don't worry though they will only read
// you, no thread would dare to change you"
new Thread(new workerThread(ll,1)).start();
new Thread(new workerThread(ll,2)).start();
}
Yes, in your specific example code it's okay, since the act of creating the new thread should define a happens-before relationship between populating the list and reading it from another thread." There are plenty of ways that a seemingly-similar set up could be unsafe, however.
I highly recommend reading "Java Concurrency in Practice" by Brian Goetz et al for more details.
If your code created and populated the list with a single thread and only in a second moment you create other threads that concurrently access the list there is no problem.
Only when a thread can modify a value while other threads try to read the same value can happens problems.
It can be a problem if you change the object you retrieve (also if you don't change the list itself).
Although one answer was warning about "unflushed cache" and advicing to know "java memory model".
I think you are referring to my Answer to this Question: Can Java LinkedList be read in multiple-threads safely?.
So I'm asking to elaborate those "evil" caches.
They are not evil. They are just a fact of life ... and they affect the correctness (thread-safety) reasoning for multi-threaded applications.
The Java Memory Model is Java's answer to this fact of life. The memory model specifies with mathematical precision a bunch of rules that need to be obeyed to ensure that all possible executions of your application are "well-formed". (In simple terms: that your application is thread-safe.)
The Java Memory Model is ... difficult.
Someone recommended "Java Concurrency in Practice" by Brian Goetz et al. I concur. It is the best textbook on the topic of writing "classic" Java multi-threaded applications, and it has a good explanation of the Java Memory Model.
More importantly, Goetz et al gives you a simpler set of rules that are sufficient to give you thread-safety. These rules are still too detailed to condense into StackOverflow answer ... but
one of the concepts is "safe publication", and
one of the principles is to use / re-use existing concurrency constructs rather than to roll your own concurrency mechanisms based on the Memory Model.
I'm a newbie and so far I still naively believe that following code is ok.
It >>is<< correct. However ...
(at least from my tests)
... testing is NOT a guarantee of anything. The problem with non-thread-safe programs is that the faults are frequently not revealed by testing because they manifest randomly, with low probability, and often differently on different platforms.
You cannot rely on testing to tell you that your code is thread-safe. You need to reason1 about the behaviour ... or follow a set of well-founded rules.
1 - And I mean real, well-founded reasoning ... not seat-of-the-pants intuitive stuff.
The way you're using it is fine, but only by coincidence.
Programs are rarely that trivial:
If the List contains references to other (mutable) data, then you'll get race conditions.
If someone modifies your 'reader' threads later in the code's lifecycle, then you'll get races.
Immutable data (and data structures) are by definition thread-safe. However, this is a mutable List, even though you're making the agreement with yourself that you won't modify it.
I'd recommend wrapping the List<> instance like this so the code fails immediately if someone tries to use any mutators on the List:
List<Integer> immutableList = Collections.unmodifiableList(ll);
//...pass 'immutableList' to threads.
Link to unmodifiableList
You need to guarantee happens-before relationship between reads and writes in your LinkedList because they are done in separate threads.
Result of ll.add(i) will be visible for new workerThread because Thread.start forms happens-before relationship. So your example is thread safe. See more about happens-before conditions.
However be aware of more complex situation, when LinkedList is read during iteration in worker threads and at the same time it is modified by the main thread. Like this:
for(int i=0;i<1e3;i++) {
ll.add(i);
new Thread(new workerThread(ll,1)).start();
new Thread(new workerThread(ll,2)).start();
}
This way ConcurrentModificationException is possible.
There are several options:
Clone your LinkedList inside of workerThread and iterate the copy
instead.
Use synchronization both for list modification and for list
iteration (but it will lead to poor concurrency).
Instead of LinkedList use CopyOnWriteArrayList.
Sorry for answering to my question. But I was thinking of your reassuring answers and I found it may not be so safe as it seems. I found and tested case when it is not working - if object would use it's class variable for storing any data (I wouldn't know about) then it would fail (then the only question is if linked list (and other java classes) in some implementation can do it...) See failing example:
public class DummyLinkedList {
public LinkedList<Integer> ll;
public DummyLinkedList(){
ll = new LinkedList<Integer>();
}
int lastGetIndex;
int myDummyGet(int idx){
lastGetIndex = idx;
//return ll.get(idx); // thids would work fine as parameter is on the stack so uniq for each call (at least if java supports reentrant functions)
return ll.get(lastGetIndex); // this would make a problem even for only readin the object - question is how many such issues java.* contains
}
}
It depends on how the object was created and made available to your thread. In general, no, it's not safe, even if the object isn't modified.
Following are some ways to make it safe.
First, create the object and perform any modification that is necessary; you can consider the object to be effectively immutable if no more modifications occur. Then, share the effectively immutable object with other threads by one of the following means:
Have other threads read the object from a field that is volatile.
Write a reference to the object inside a synchronized block, then have other threads read that reference while synchronized on the same lock.
Start the reading threads after the object is initialized, passing the object as a parameter. (This is what you are doing in your example, so you are safe.)
Pass the object between threads using a concurrent mechanism like a BlockingQueue implementation, or publish it in a concurrent collection, like a ConcurrentMap implementation.
There might be others. Alternatively, you can make all of the fields of the shared object final (including all the fields of its Object members, and so on). Then it will be safe to share this object by any means across threads. That's one of the under-appreciated virtues of immutable types.
If you only access to the list is by 'read' methods (including iterations) then you are fine. Like in your code.
Suppose we want to implement a cache for a particular entity.
class Cache {
private static Map<String, Object> cache = new HashMap<>();
public static Object get(String id) {
assert notNullOrEmpty(id);
return cache.get(id);
}
public static Object add(String id, Object element) {
assert notNullOrEmpty(id) && notNull(element);
if(cache.containsKey(id)) return cache.get(id);
cache.put(id, element);
return element;
}
}
now we want to ensure this is threadsafe and most importantly optimal when it comes to data access and performance (we dont want to block when its not necessary). For example if we mark both methods as synchronized we will uslessly block two concurrent get() calls which could perfectly work without block.
so we want to block get() only if add() is in process, and block add only if at least one get() or an add() is in process. Multiple concurrent get() executions should not block each other...
How do we do this?
UPDATE
In fact this is not a cache but just a use case i've come up with to describe the problem, the actual purpose is to create a singletone instances store...
For example there is a Currency type which is only instantiated trough its builder and is immutable, builder itself after verifying that parameters passed in are valid checks this so called global cache in static context to see if there is an instance already created... well you got me...
This is not an enum usecase because system will dynamically add new Currency, Market or even Exchange instances which all should be loosely coupled and instantiated only once... (also to prevent heavy GC)
So to clarify the question... think of the global problem of concurrency not the particular examlpe.
I've found this link quite helpful http://tutorials.jenkov.com/java-concurrency/read-write-locks.html
i guess there are some lock types already in JDK for this purpose, but not sure yet.
Actually I gave a talk on this just today at the FOSDEM conference in Burssels. See the slides here: http://www.slideshare.net/cruftex/cache2k-java-caching-turbo-charged-fosdem-2015
Basically you can use Google Guava, however, since Guava is a cache which uses LRU, there is still a synchronized block needed. Something which I am exploring in cache2k is used an advanced eviction algorithm, that needs no list manipulation for the cache access, so locks whatsoever at all.
cache2k is on maven central, add cache2k-api and cache2k-core as dependency and initialize the cache with:
cache =
CacheBuilder.newCache(String.class, Object.class)
.implementation(ClockProPlusCache.class)
.build();
If you have only cache hits, cache2k is about 5x faster then Guava and 10x faster then EHCache. For your usage pattern e.g. with the Currency type you can run the cache in read through configuration and add a cache source which is responsible for constructing the Currency instances.
So, you don't necessarily do look out for a cache. For the currency example you don't need a cache, since there is a limited space of currency instances. If you want to do the same with a possible non limited space, the cache is the more universal solution, since you have to limit the resource consumption. One example I explored, is using this for formatted dates. See: https://github.com/headissue/cache2k-benchmark/blob/master/zoo/src/test/java/org/cache2k/benchmark/DateFormattingBenchmark.java
For general questions on cache2k, feel free to post them on stack overflow.
I use the following code to initialize a synchronized instance of an EnumSet:
private final Set<MyClass> instance = Collections.synchronizedSet(EnumSet.noneOf(MyClass.class));
I have two questions:
do I retain all the benefits of an EnumSet like compactness and efficiency in this case?
is there a more... let'say... semantically rich way to get an empty and synchronized instance of EnumSet?
Well from the javadoc:
If multiple threads access an enum set concurrently, and at least one of the threads modifies the set, it should be synchronized externally. This is typically accomplished by synchronizing on some object that naturally encapsulates the enum set. If no such object exists, the set should be "wrapped" using the Collections.synchronizedSet(java.util.Set) method. This is best done at creation time, to prevent accidental unsynchronized access:
Set s = Collections.synchronizedSet(EnumSet.noneOf(MyEnum.class));
so I think that's the best you can do.
I would also keep the Set final as you did. It's odd that they don't mention it in the javadoc.
EDIT: to answer the first question, short answer yes, long answer, yes but you have to pay the price for synchronization on top of that.
I have few ArrayList<T> containing user defined objects (e.g. List<Student>, List<Teachers>). The objects are immutable in nature, i.e. no setters are provided - and also, due to the nature of the problem, "no one" will ever attempt to modify these objects. Once the 'ArrayList' is populated, no further addition/removal of objects is allowed/possible. So List will not dynamically change.
With such given condition, can this data structures (i.e. ArraList) be safely used by multiple threads (simultaneously)? Each of the thread will just read the object-properties, but there is no "set" operation possible.
So, my question is can I rely on ArrayList? If not, what other less expensive data structures can be used in such scenario?
You can share any objects or data structures between threads if they are never modified after a safe publication. As mentioned in the comments, there must be a * happen-before* relationship between the writes that initialize the ArrayList and the read by which the other threads acquire the reference.
E.g. if you setup the ArrayList completely before starting the other threads or before submitting the tasks working on the list to an ExecutorService you are safe.
If the threads are already running you have to use one of the thread safe mechanisms to hand-over the ArrayList reference to the other threads, e.g. by putting it on a BlockingQueue.
Even simplest forms like storing the reference into a static final or volatile field will work.
Keep in mind that your precondition of never modifying the object afterwards must always hold. It’s recommended to enforce that constraint by wrapping the list using Collections.unmodifiableList(…) and forget about the original list reference before publishing:
class Example {
public static final List<String> THREAD_SAFE_LIST;
static {
ArrayList<String> list=new ArrayList<>();
// do the setup
THREAD_SAFE_LIST=Collections.unmodifiableList(list);
}
}
or
class Example {
public static final List<String> THREAD_SAFE_LIST
=Collections.unmodifiableList(Arrays.asList("foo", "bar"));
}
Yes, you should be able to pass the array into each thread. There should be no access errors so long as the array is finished being written to before any thread is possibly getting the information.