I was reading some of the concurrency patterns in Brian Goetze's Java Concurrency in Practice and got confused over when is the right time to make the code thread safe.
I normally write code that's meant to run in a single thread so I do not worry too much about thread safety and synchronization etc. However, there always exists a possibility that the same code may be re-used sometime later in a multi-threaded environment.
So my question is, when should one start thinking about thread safety? Should I assume the worst at the onset and always write thread-safe code from the beginning or should I revisit the code and modify for thread safety if such a need arises later ?
Are there some concurrency patterns/anti-patterns that I must always be aware of even while writing single-threaded applications so that my code doesn't break if it's later used in a multi-threaded environment ?
You should think about thread safety when your code will be used in a multithreaded environment. There is no point in tackling the complexity if it will only be run in a singlethreaded environment.
That being said, there are simple things you can do that are good practices anyway and will help with multithreading:
As Josh Bloch says, Favor Immutability. Immutable classes are threadsafe almost by definition;
Only use data members or static variables where required rather than a convenience.
Making your code thread safe can be as simple as adding a comment that says the class was not designed for concurrent use by multiple threads. So, in that sense: yes, all of your classes should be thread safe.
However, in practice, many, many types are likely to be used by only a single thread, often only referenced as local variables. This can be true even if the program as a whole is multi-threaded. It would be a mistake to make every object safe for multi-threaded access. While the penalty may be small, it is pervasive, and can add up to be a significant, hard-to-fix performance problem.
I advise you to obtain a copy of "Effective Java", 2nd Ed. by Joshua Bloch. That book devotes a whole chapter to concurrency, including a solid exploration of the issue of when (and when not) to synchronize. Note, for example, the title of item 67 in "Effective Java": 'Avoid excessive synchronization', which is elaborated over five pages.
As was stated previously, you need thread safety when you think your code will be used in a multithreaded environment.
Consider the approach taken by the Collections classes, where you provide a thread-unsafe class that does all its work without using synchronize, and you also provide another class that wraps the unsynchonized class and providing all of the same public methods but making them synchronize on the underlying object.
This gives your clients a choice of using the multi-threaded or the single-threaded version of your code. It may also simplify your coding by isolating all of the threading/locking logic in a separate class.
Related
I'm reading J. Bloch's effective java and now I'm at the section about executors. He said that we should prefer using executors to directly usage of Threads. As far as I got the primary reason for that is
The key abstraction is no longer Thread , which served as both the
unit of work and the mechanism for executing it. Now the unit of work
and mechanism are separate. The key abstraction is the unit of
work, which is called a task.
It's not quite clear what the unit of work means here. I tried to search for it and found that there's a design pattern related to db-operation. But how does it tie with Threads? Or there is another interpretation of this pattern?
It's purposefully nebulous: it's just "a thing you want done," and the more precise meaning is up to you.
If you want to download a file, that's a unit of work. If you want to compute a hash of some big chuck of data, that's a unit of work. If you want to draw something on the screen, that's a unit of work.
What this blurb is getting at is that this unit of work used to be tied directly to a Java thread (via the Thread class), which is in turn tied relatively directly to the OS's threads (some hand-waving there :) ). A more modern approach is to define the work as a task, and then give a bunch of tasks to a Thread whose life cycle is longer than any of those tasks. That thread then executes those tasks. This lets you more explicitly manage Thread resources, which are relatively heavyweight.
A rough analogy would be to hire a new employee for every task you want done (write this spec, or make some coffee, or fix this bug) vs hiring just a few employees and giving them small tasks as needed.
This question is similar to this SO post (q.v. Jon Skeet's highly upvoted answer), but I will give an answer anyway. There are two ways to create a thread in Java. The first way is to extend the Thread class directly, and the second is to implement the Runnable interface. These two options are what Bloch is generally arguing over.
If you choose to extend the Thread class directly, then you are not free to extend any other class (since Java does not allow multiple inheritance). This is a design limitation. On the other hand, if you implement the marker interface Runnable, then you are still free to extend any other class you wish.
Philosophically, using the interface is generally considered better design, because it simply marks a class a something which can be run as a thread, without explicitly saying how it will be run.
The simplest (and so not quite correct) approach is to understand the unit of work as execution of a method (procedure, function). Anyway, unit of work is always represented as a method.
Is it true that if I only use immutable data type, my Java program would be thread safe?
Any other factors will affect the thread safety?
****Would appreciate if can provide an example. Thanks!**
**
Thread safety is about protecting shared data and immutable objects are protected as they are read only. Well apart from when you create them but creating a object is thread safe.
It's worth saying that designing a large application that ONLY uses immutable objects to achieve thread safety would be difficult.
It's a complicated subject and I would recommend you reading Java Concurrency in Practice
which is a very good place to start.
It is true. The problem is that it's a pretty serious limitation to place on your application to only use immutable data types. You can't have any persistent objects with state which exist across threads.
I don't understand why you'd want to do it, but that doesn't make it any less true.
Details and example: http://www.javapractices.com/topic/TopicAction.do?Id=29
If every single variable is immutable (never changed once assigned) you would indeed have a trivially thread-safe program.
Functional programming environments takes advantage of this.
However, it is pretty difficult to do pure functional programming in a language not designed for it from the ground up.
A trivial example of something you can't do in a pure functional program is use a loop, as you can't increment a counter. You have to use recursive functions instead to achieve the same effect.
If you are just straying into the world of thread safety and concurrency, I'd heartily recommend the book Java Concurrency in Practice, by Goetz. It is written for Java, but actually the issues it talks about are relevant in other languages too, even if the solutions to those issues may be different.
Immutability allows for safety against certain things that can go wrong with multi-threaded cases. Specifically, it means that the properties of an object visible to one thread cannot be changed by another thread while that first thread is using it (since nothing can change it, then clearly another thread can't).
Of course, this only works as far as that object goes. If a mutable reference to the object is also shared, then some cases of cross-thread bugs can happen by something putting a new object there (but not all, since it may not matter if a thread works on an object that has already been replaced, but then again that may be crucial).
In all, immutability should be considered one of the ways that you can ensure thread-safety, but neither the sole way nor necessarily sufficient in itself.
Although immutable objects are a help with thread safety, you may find "local variables" and "synchronize" more practical for real world progamming.
Any program where no mutable aspect of program state is accessed by more than one thread will be trivally thread-safe, as each thread may as well be its own separate program. Useful multi-threading, however, generally requires interaction between threads, which implies the existence of some mutable shared state.
The key to safe and efficient multi-threading is to incorporate mutability at the right "design level". Ideally, each aspect of program state should be representable by one immutably-rooted(*), mutable reference to an object whose observable state is immutable. Only one thread at a time may try to change the state represented by a particular mutable reference. Efficient multi-threading requires that the "mutable layer" in a program's state be low enough that different threads can use different parts of it. For example, if one has an immutable AllCustomers data structure and two threads simultaneously attempted to change different customers, each would generate a version of the AllCustomers data structure which included its own changes, but not that of the other thread. No good. If AllCustomers were a mutable array of CustomerState objects, however, it would be possible for one thread to be working on AllCustomers[4] while another was working on AllCustomers[9], without interference.
(*) The rooted path must exist when the aspect of state becomes relevant, and must not change while the access is relevant. For example, one could design an AddOnlyList<thing> which hold a thing[][] called Arr that was initialized to size 32. When the first thing is added, Arr[0] would be initialized, using CompareExchange, to an array of 16 thing. The next 15 things would go in that array. When the 17th thing is added, Arr[1] would be initialized using CompareExchange to an array of size 32 (which would hold the new item and the 31 items after it). When the 49th thing is added, Arr[2] would be initialized for 64 items. Note that while thing itself and the arrays contained thereby would not be totally immutable, only the very first access to any element would be a write, and once Arr[x][y] holds a reference to something, it would continue to do so as long as Arr exists.
Is it possible to write a class such that other programmers cannot acquire a lock on an instance of the class?
Lock-abuse, if there's a term like that, can be a serious killer. unfortunately, programmers torn between the disastrous forces of delivering thread-safe code and limited knowledge of concurrency, can wreak havoc by adopting the approach of locking instances even when they're invoking operations which really don't require the instance's resources to be blocked
The only way to do this is to ensure that the classes instances are not visible. For example, you could declare is as a private nested class, and make sure that the enclosing class does not leak reference instances.
Basically, if something can get hold of a reference to an instance, there is nothing to stop it from locking it.
Normally, it is sufficient to ensure that the reference to the lock object doesn't leak ... and not worry about the class visibility.
FOLLOW UP - in response to the OP's comments.
You cannot stop someone else's code from taking a lock an instance of one of your classes. But you can design your class so that this won't interfere with your classes internal synchronization. Simply arrange that your class uses a private object (even an Object instance) for synchronizing.
In the more general sense, you cannot stop application programmers from using your classes in ways that you don't like. Other examples I've heard of (here) include trying force people to override methods or provide particular constructors. Even declaring your class fields private won't stop a determined (or desperate) programmer from using reflection to get at them.
But the flip-side is that those things you are trying to prevent might actually not be stupid after all. For example, there could actually be a sound reason for an application to use your class as a lock object, notwithstanding your objection that it reduces concurrency. (And it in general it does, It is just that this may not be relevant in the particular case.)
My general feeling is that is a good idea to document the way your class is designed to be used, and design the API to encourage this. But it is unwise to try to force the issue. Ultimately it is the responsibility of the people who code against your classes to use them sensibly ... not yours.
If a class has members that require protection from concurrent access, locking should be done internally. Otherwise, you're forcing those who use it to understand the details of its implementation when they shouldn't be able to see past its interface.
When creating a new instance, also create a new thread which immediately synchronizes on the instance and goes to sleep (with Thread.sleep()). Any code trying to synchronize on the instance will just deadlock, thus the developer has to rethink his approach.
Disclaimer:
Don't vote me done because my suggestion is insane. I know it is. I am just answering the question. Do not actually do this!!!
Java 6 API question. Does calling LockSupport.unpark(thread) have a happens-before relationship to the return from LockSupport.park in the just-unparked thread? I strongly suspect the answer is yes, but the Javadoc doesn't seem to mention it explicitly.
I have just found this question because I was asking myself the same thing. According to this article by Oracle researcher David Dice, the answer seems to be no. Here's the relevant part of the article:
If a thread is blocked in park() we're guaranteed that a subsequent
unpark() will make it ready. A perfectly legal but low-quality
implementation of park() and unpark() would be empty methods, in which
the program degenerates to simple spinning. An in fact that's the
litmus test for correct park()-unpark() usage.
Empty park() and unpark() methods do not give you any happens-before relationship guarantees, so for your program to be 100% portable, you should not rely on them.
Then again, the Javadoc of LockSupport says:
These methods are designed to be used as tools for creating
higher-level synchronization utilities, and are not in themselves
useful for most concurrency control applications. The park method is
designed for use only in constructions of the form:
while (!canProceed()) { ... LockSupport.park(this); }
Since you have to explicitly check some condition anyway, which will either involve volatile or properly synchronized variables, the weak guarantees of park() should not actually be problem, right?
If it isn't documented as such then you CANNOT rely on it creating a happens before relationship.
Specifically LockSupport.java in Hotspot code simply calls Unsafe.park and .unpark!
The happens-before relationship will generally come from a write-read pair on a volatile status flag or something similar.
Remember, if it isn't documented as creating a happens-before relationship then you must treat it as though it does not even if you can prove that it does on your specific system. Future systems and implementations may not. They left themselves that freedom for good reason.
I have looked though the JDK code and it looks like LockSupport methods are normally called outside of synchronization blocks. So, your assumption seems to be correct.
How to determine part of what Java code needs to be synchronized? Are there any unit testing technics?
Samples of code are welcome.
Code needs to be synchronized when there might be multiple threads that work on the same data at the same time.
Whether code needs to be synchronized is not something that you can discover by unit testing. You must think and design your program carefully when your program is multi-threaded to avoid issues.
A good book on concurrent programming in Java is Java Concurrency in Practice.
If I understand your question correctly, you want to know what you have to synchronise. Unfortunately there isn't a boiler plate code to provide that shows you what to synchronise - you should take a look at methods and instance variables that can be accessed by multiple threads at the same time. If there aren't such, you usually don't need to worry about synchronisation too much.
This is a good source for some general information:
http://weblogs.java.net/blog/caroljmcdonald/archive/2009/09/17/some-java-concurrency-tips
When you are in a multithreaded environment in Java and you want to do many things in parallel, I would suggest using an approach which uses the concurrent Queue (like BlockingQueue or ConcurrentLinkedQueue) implementations and a simple Runnable that has a reference to the queue and pulls 'messages' of the queue. Use an ExecutorService to manage the tasks. Sort of a (very simplified) Actor type of model.
So choose not to share state as much as possible, because if you do, you need to synchronize, or use a data structure that supports concurrent access like the ConcurrentHashMap.
There's no substitute for thinking about the issues surrounding your code (as the other answers here illustrate). Once you've done that, though, it's worth running FindBugs over your code. It will identify where you've applied synchronisation inconsistently, and is a great help in tracking otherwise hard-to-find bugs.
Lot of nice answers here:
Java synchronization and performance in an aspect
A nice analysis of your problem is available here:
http://portal.acm.org/citation.cfm?id=1370093&dl=GUIDE&coll=GUIDE&CFID=57662261&CFTOKEN=95754288 (require access to ACM portal)
Yes, all these folks are right - no alternative for thinking. But here is the thumb rule..
1. If its a read - perhaps you do not need synchronization
2. If its a 'write' - you should consider it...