Double-Checked Locking - does it work in Java on earth?

Double-Checked Locking - does it work in Java on earth? - java

all:
Here is the famous article:
The "Double-Checked Locking is Broken" Declaration
It declares that pattern doesn't work in Java. It further says, close to the end, that new JVM can make the pattern work by using volatile.
However, in another article: Memory Barriers and JVM Concurrency
It says keyword "synchronized" generates memory barrier full fences. So who is right? Does the pattern work in Java on earth?

There are essentially 3 ways to fix double-checked locking:
ensure that the variable is declared volatile (works from Java 5 onwards);
just don't bother with it in the first place: just use synchronization and don't try to mess around with fancy bug-prone-- and probably pointless-- means of "avoiding" it;
let the classloader do the synchronization for you.
I've posted example code here.
BUT: Double-checked locking is really an outdated paradigm, if indeed it was ever useful in Java. As I see things, it was essentially carried over into Java by C programmers who didn't fully appreciate that the JVM effectively has a more efficient (and correct!) way of dealing with the issue built into the classloader and that optimisations to synchronization are generally best made at the JVM level.
I've seen a lot of people clutter their code with this "pattern". I don't think I've ever seen any actual data showing that it has any benefit.
Plus: if you do have a large application that is hitting synchronization issues, then one of the whole raisons d'être of Java is that it has rich concurrency libraries. Look at how you can re-work your application to use them... if profiling data proves it to be necessary.

It depends on what version of java you are using.
This has been fixed in java 5 and forward.
Check http://en.wikipedia.org/wiki/Double-checked_locking#Usage_in_Java

They're both right, and DCL works fine in Java from 5 on.
If you are expecting your program to produce the exact same output every time given the exact same input, and you are using DCL, you may want to seriously rethink what you are doing. An awful lot can depend on who gets to the lock first--you're rolling a lot of dice. Not good for an accounting app.
If your program involves balls bouncing off walls and each other, DCL may make a lot of sense. It does work. Synchronizing has to be a bit slower than non-synchronizing even without contention, so why do it if a simple if can prevent it? And if 100 threads pile up on a synch statement when the needed object already exists, that has to be a lot slower.

The keyword "synchronized" that generates memory barrier full fences does not mean DCL could work properly. Let's take the following code as example:
public static Runnable getInstance()
{
if (null == instance) //1
{
synchronized (Runnable.class)
{
if (null == instance)
{
instance = new Runnable(); //2
}
}
}
return instance;
}
We know that JVM will follow many steps when construct an object. We focus 2 important steps here:
First, JVM malloc the memory for this object. The value of member-variables in this object has defaut value for now. Second, JVM calls method and assigns the user-specified value to the member variables.
That means thread A may get a partitially-constructed instance in code 1 (in the middle of the code 1 and code 2) . Although "synchronized" generates memory barrier full fences, there is no happen-before guarantee in code 1 and code 2. Memory barrier fences take effect during synchronized code block. Code 1 is outside the synchronized code block.

Related

How does the JVM internally handle race conditions?

If multiple threads try to update the same member variable, it is called a race condition. But I was more interested in knowing how the JVM handles it internally if we don't handle it in our code by making it synchronised or something else? Will it hang my program? How will the JVM react to it? I thought the JVM would temporarily create a sync block for this situation, but I'm not sure what exactly would be happening.
If any of you have some insight, it would be good to know.

The precise term is a data race, which is a specialization of the general concept of a race condition. The term data race is an official, precisely specified concept, which means that it arises from a formal analysis of the code.
The only way to get the real picture is to go and study the Memory Model chapter of the Java Language Specification, but this is a simplified view: whenever you have a data race, there is almost no guarantee as to the outcome and a reading thread may see any value which has ever been written to the variable. Therein also lies the only guarantee: the thread will not observe an "out-of-thin-air" value, such which was never written. Well, unless you're dealing with longs or doubles, then you may see torn writes.

Maybe I'm missing something but what is there to handle? There is still a thread that will get there first. Depending on which thread that is, that thread will just update/read some variable and proceed to the next instruction. It can't magically construct a sync block, it doesn't really know what you want to do. So in other words what happens will depend on the outcome of the 'race'.
Note I'm not heavily into the lower level stuff so perhaps I don't fully understand the depth of your question.

Java provides synchronized and volatile to deal with these situations. Using them properly can be frustratingly difficult, but keep in mind that Java is only exposing the complexity of modern CPU and memory architectures. The alternatives would be to always err on the side of caution, effectively synchronizing everything which would kill performance; or ignore the problem and offer no thread safety whatsoever. And fortunately, Java provides excellent high-level constructs in the java.util.concurrent package, so you can often avoid dealing with the low-level stuff.

In short, the JVM assumes that code is free of data races when translating it into machine code. That is, if code is not correctly synchronized, the Java Language Specification provides only limited guarantees about the behavior of that code.
Most modern hardware likewise assumes that code is free of data races when executing it. That is, if code is not correctly synchronized, the hardware makes only limited guarantees about the result of its execution.
In particular, the Java Language Specification guarantees the following only in the absence of a data race:
visibility: reading a field yields the value last assigned to it (it is unclear which write was last, and writes of long or double variables need not be atomic)
ordering: if a write is visible, so are any writes preceding it. For instance, if one thread executes:
x = new FancyObject();
another thread can read x only after the constructor of FancyObject has executed completely.
In the presence of a data race, these guarantees are null and void. It is possible for a reading thread to never see a write. It is also possible to see the write of x, without seeing the effect of the constructor that logically preceded the write of x. It is very unlikely that the program is correct if such basic assumptions can not be made.
A data race will however not compromise the integrity of the Java Virtual Machine. In particular, the JVM will not crash or halt, and still guarantee memory safety (i.e. prevent memory corruption) and certain semantics of final fields.

The JVM will handle the situation just fine (ie it will not hang or complain), but you may not get a result that you like!
When multiple threads are involved, java becomes fiendishly complicated and even code that looks obviously correct can turn out to be horribly broken. As an example:
public class IntCounter {
private int i;
public IntCounter(int i){
this.i = i;
}
public void incrementInt(){
i++;
}
public int getInt(){
return i;
}
}
is flawed in many ways.
First, let's say that i is currently 0 and thread A and thread B both call incrementInt() at about the same time. There is a danger that they will both see that i is 0, then both increment it 1 and then save the result. So at the end of the two calls, i is only 1, not 2!
That's the race condition problem with the code, but there are other problems concerning memory visibility. When thread A changes a shared variable, there is no guarantee (without synchronization) that thread B will ever see the changes!
So thread A could increment i 100 times, and an hour later, thread B, calling getInt(), might see i as 0, or 100 or anywhere in between!
The only sane thing to do if you are delving into java concurrency is to read Java Concurrency in Practice by Brian Goetz et al. (OK there's probably other good ways to learn about it, but this is a great book co written by Joshua Bloch, Doug Lea and others)

Multi-thread state visibility in Java: is there a way to turn the JVM into the worst case scenario?

Suppose our code has 2 threads (A and B) have a reference to the same instance of this class somewhere:
public class MyValueHolder {
private int value = 1;
// ... getter and setter
}
When Thread A does myValueHolder.setValue(7), there is no guarantee that Thread B will ever read that value: myValueHolder.getValue() could - in theory - keep returning 1 forever.
In practice however, the hardware will clear the second level cache sooner or later, so Thread B will read 7 sooner or later (usually sooner).
Is there any way to make the JVM emulate that worst case scenario for which it keeps returning 1 forever for Thread B? That would be very useful to test our multi-threaded code with our existing tests under those circumstances.

jcstress maintainer here. There are multiple ways to answer that question.
The easiest solution would be wrapping the getter in the loop, and let JIT hoist it. This is allowed for non-volatile field reads, and simulates the visibility failure with compiler optimization.
More sophisticated trick involves getting the debug build of OpenJDK, and using -XX:+StressLCM -XX:+StressGCM, effectively doing the instruction scheduling fuzzing. Chances are the load in question will float somewhere you can detect with the regular tests your product has.
I am not sure if there is practical hardware holding the written value long enough opaque to cache coherency, but it is somewhat easy to build the testcase with jcstress. You have to keep in mind that the optimization in (1) can also happen, so we need to employ a trick to prevent that. I think something like this should work.

It would be great to have a Java compiler that would intentionally perform as many weird (but allowed) transfirmations as possible to be able to break thread unsafe code more easily, like Csmith for C. Unfortunately, such a compiler does not exist (as far as I know).
In the meantime, you can try the jcstress library* and exercise your code on several architectures, if possible with weaker memory models (i.e. not x86) to try and break your code:
The Java Concurrency Stress tests (jcstress) is an experimental harness and a suite of tests aid research in the correctness of concurrency support in the JVM, class libraries, and hardware.
But in the end, unfortunately, the only way to prove that a piece of code is 100% correct is code inspection (and I don't know of a static code analysis tool able to detect all race conditions).
*I have not used it and I am unclear which of jcstress and the java-concurrency-torture library is more up to date (I would suspect jcstress).

Not on a real machine, sadly testing multi-threaded code will remain difficult.
As you say, the hardware will clear the second level cache and the JVM has no control over that. The JSL only specifies what must happen and this is a case where B might never see the updated value of value.
The only way to force this to happen on a real machine is to alter the code in such a way to void your testing strategy i.e. you end up testing different code.
However, you might be able to run this on a simulator that simulates hardware that doesn't clear the second level cache. Sounds like a lot of effort though!

I think you are refering to the principle called "false sharing" where different CPUs must synchronize their caches or else face the possibility that data such as you describe could become mismatched. There is a very good article on false sharing on Intel's website. Intel describes some useful tools in their article for diagnosing this problem. This is a relevant quote:
The primary means of avoiding false sharing is through code
inspection. Instances where threads access global or dynamically
allocated shared data structures are potential sources of false
sharing. Note that false sharing can be obscured by the fact that
threads may be accessing completely different global variables that
happen to be relatively close together in memory. Thread-local storage
or local variables can be ruled out as sources of false sharing.
Although methods described in the article are not what you have asked for (forcing worst-case behavior from the JVM), as already stated this isn't really possible. The methods described in this article are the best way I know to try to diagnose and avoid false sharing.
There are other resources addressing this problem around the web. For example, this article has a suggestion for a way to avoid false sharing in Java. I have not tried this method, so I cannot vouch for it, but I think the author's idea is sound. You might consider trying out his suggestion.

I have previously suggested a worst case behaving JVM for test purposes on the memory model list but the idea didn't seem popular.
So how to gain "worst case JVM behaviour" , with existing technology i.e how can I test the scenario in the question and get it to fail EVERY time. You could try to find the setup with the weakest memory model possible but that's unlikely to be perfect.
What I have often considered is using a distributed JVM something similar to how I believe Terracotta works under the cover so your application now runs on multiple JVM's (either remote or local) (threads in the same application run in different instances). In this setup inter JVM thread communication takes place at memory barriers e.g. the synchronized keywords you are missing in bugged code for instance (it conforms to the Java Memory Model) and the application is configured i.e. you say this class thread runs here . No code change required to your tests just configuration, any well ordered java application should run out of the box, however this setup would be very intolerant of a badly ordered application (normally a problem ... now an asset i.e. the Memory model exhibits very weak but legal behavior). In the example above loading the code onto a cluster, if two threads run on different nodes setValue has no effect visible to the other thread unless the code was changed and synchronized, volatile etc etc were used, then the code works as intended.
Now your test for the example above (configured correctly) would fail every time without correct "happens before ordering" which is potentially very useful for tests. The flaw in the plan for complete coverage you would need a potentially a node per application thread (can be same machine or multiple in a cluster) or multiple test runs. If you have 1000's of threads then that could be prohibitive though hopefully they would be pooled and scaled down for E2E test scenarios or run it in a cloud. If nothing else this kind of setup might be useful in demonstrating the issue.
inter thread communication across JVMs

The example you have given is described as Incorrectly Synchronized in http://docs.oracle.com/javase/specs/jls/se7/html/jls-17.html#jls-17.4. I think this is always incorrect and will lead to bugs sooner or later. Most of the times later :-).
To find such incorrectly synchronized code blocks, I use the following algorithm:
Record the threads for all field modifications using instrumentation. If a field is modified by more than one thread without synchronization, I have found a data race.
I implemented this algorithm inside http://vmlens.com, which is a tool to find data races inside java programs.

Here's a simple way: just comment out the code for setValue. You can uncomment it after testing. Since in many cases like this a mechanism is needed to fake failures, it would be a good idea to build a general mechanism for all such cases.

Why do all Java Objects have wait() and notify() and does this cause a performance hit?

Every Java Object has the methods wait() and notify() (and additional variants). I have never used these and I suspect many others haven't. Why are these so fundamental that every object has to have them and is there a performance hit in having them (presumably some state is stored in them)?
EDIT to emphasize the question. If I have a List<Double> with 100,000 elements then every Double has these methods as it is extended from Object. But it seems unlikely that all of these have to know about the threads that manage the List.
EDIT excellent and useful answers. #Jon has a very good blog post which crystallised my gut feelings. I also agree completely with #Bob_Cross that you should show a performance problem before worrying about it. (Also as the nth law of successful languages if it had been a performance hit then Sun or someone would have fixed it).

Well, it does mean that every object has to potentially have a monitor associated with it. The same monitor is used for synchronized. If you agree with the decision to be able to synchronize on any object, then wait() and notify() don't add any more per-object state. The JVM may allocate the actual monitor lazily (I know .NET does) but there has to be some storage space available to say which monitor is associated with the object. Admittedly it's possible that this is a very small amount (e.g. 3 bytes) which wouldn't actually save any memory anyway due to padding of the rest of the object overhead - you'd have to look at how each individual JVM handled memory to say for sure.
Note that just having extra methods doesn't affect performance (other than very slightly due to the code obvious being present somewhere). It's not like each object or even each type has its own copy of the code for wait() and notify(). Depending on how the vtables work, each type may end up with an extra vtable entry for each inherited method - but that's still only on a per type basis, not a per object basis. That's basically going to get lost in the noise compared with the bulk of the storage which is for the actual objects themselves.
Personally, I feel that both .NET and Java made a mistake by associating a monitor with every object - I'd rather have explicit synchronization objects instead. I wrote a bit more on this in a blog post about redesigning java.lang.Object/System.Object.

Why are these so fundamental that
every object has to have them and is
there a performance hit in having them
(presumably some state is stored in
them)?
tl;dr: They are thread-safety methods and they have small costs relative to their value.
The fundamental realities that these methods support are that:
Java is always multi-threaded. Example: check out the list of Threads used by a process using jconsole or jvisualvm some time.
Correctness is more important than "performance." When I was grading projects (many years ago), I used to have to explain "getting to the wrong answer really fast is still wrong."
Fundamentally, these methods provide some of the hooks to manage per-Object monitors used in synchronization. Specifically, if I have synchronized(objectWithMonitor) in a particular method, I can use objectWithMonitor.wait() to yield that monitor (e.g., if I need another method to complete a computation before I can proceed). In that case, that will allow one other method that was blocked waiting for that monitor to proceed.
On the other hand, I can use objectWithMonitor.notifyAll() to let Threads that are waiting for the monitor know that I am going to be relinquishing the monitor soon. They can't actually proceed until I leave the synchronized block, though.
With respect to specific examples (e.g., long Lists of Doubles) where you might worry that there's a performance or memory hit on the monitoring mechanism, here are some points that you should likely consider:
First, prove it. If you think there is a major impact from a core Java mechanism such as multi-threaded correctness, there's an excellent chance that your intuition is false. Measure the impact first. If it's serious and you know that you'll never need to synchronize on an individual Double, consider using doubles instead.
If you aren't certain that you, your co-worker, a future maintenance coder (who might be yourself a year later), etc., will never ever ever need a fine granularity of theaded access to your data, there's an excellent chance that taking these monitors away would only make your code less flexible and maintainable.
Follow-up in response to the question on per-Object vs. explicit monitor objects:
Short answer: #JonSkeet: yes, removing the monitors would create problems: it would create friction. Keeping those monitors in Object reminds us that this is always a multithreaded system.
The built-in object monitors are not sophisticated but they are: easy to explain; work in a predictable fashion; and are clear in their purpose. synchronized(this) is a clear statement of intent. If we force novice coders to use the concurrency package exclusively, we introduce friction. What's in that package? What's a semaphore? Fork-join?
A novice coder can use the Object monitors to write decent model-view-controller code. synchronized, wait and notifyAll can be used to implement naive (in the sense of simple, accessible but perhaps not bleeding-edge performance) thread-safety. The canonical example would be one of these Doubles (posited by the OP) which can have one Thread set a value while the AWT thread gets the value to put it on a JLabel. In that case, there is no good reason to create an explicit additional Object just to have an external monitor.
At a slightly higher level of complexity, these same methods are useful as an external monitoring method. In the example above, I explicitly did that (see objectWithMonitor fragments above). Again, these methods are really handy for putting together relatively simple thread safety.
If you would like to be even more sophisticated, I think you should seriously think about reading Java Concurrency In Practice (if you haven't already). Read and write locks are very powerful without adding too much additional complexity.
Punchline: Using basic synchronization methods, you can exploit a large portion of the performance enabled by modern multi-core processors with thread-safety and without a lot of overhead.

All objects in Java have monitors associated with them. Synchronization primitives are useful in pretty much all multi-threaded code, and its semantically very nice to synchronize on the object(s) you are accessing rather than on separate "Monitor" objects.
Java may allocate the Monitors associated with the objects as needed - as .NET does - and in any case the actual overhead for simply allocating (but not using) the lock would be quite small.
In short: its really convenient to store Objects with their thread safety support bits, and there is very little performance impact.

These methods are around to implement inter-thread communication.
Check this article on the subject.
Rules for those methods, taken from that article:
wait( ) tells the calling thread to give up the monitor and go to sleep until some other
thread enters the same monitor and calls notify( ).
notify( ) wakes up the first thread that called wait( ) on the same object.
notifyAll( ) wakes up all the threads that called wait( ) on the same object. The
highest priority thread will run first.
Hope this helps...

Does the JVM create a mutex for every object in order to implement the 'synchronized' keyword? If not, how?

As a C++ programmer becoming more familiar with Java, it's a little odd to me to see language level support for locking on arbitrary objects without any kind of declaration that the object supports such locking. Creating mutexes for every object seems like a heavy cost to be automatically opted into. Besides memory usage, mutexes are an OS limited resource on some platforms. You could spin lock if mutexes aren't available but the performance characteristics of that are significantly different, which I would expect to hurt predictability.
Is the JVM smart enough in all cases to recognize that a particular object will never be the target of the synchronized keyword and thus avoid creating the mutex? The mutexes could be created lazily, but that poses a bootstrapping problem that itself necessitates a mutex, and even if that were worked around I assume there's still going to be some overhead for tracking whether a mutex has already been created or not. So I assume if such an optimization is possible, it must be done at compile time or startup. In C++ such an optimization would not be possible due to the compilation model (you couldn't know if the lock for an object was going to be used across library boundaries), but I don't know enough about Java's compilation and linking to know if the same limitations apply.

Speaking as someone who has looked at the way that some JVMs implement locks ...
The normal approach is to start out with a couple of reserved bits in the object's header word. If the object is never locked, or if it is locked but there is no contention it stays that way. If and when contention occurs on a locked object, the JVM inflates the lock into a full-blown mutex data structure, and it stays that way for the lifetime of the object.
EDIT - I just noticed that the OP was talking about OS-supported mutexes. In the examples that I've looked at, the uninflated mutexes were implemented directly using CAS instructions and the like, rather than using pthread library functions, etc.

This is really an implementation detail of the JVM, and different JVMs may implement it differently. However, it is definitely not something that can be optimized at compile time, since Java links at runtime, and this it is possible for previously unknown code to get a hold of an object created in older code and start synchronizing on it.
Note that in Java lingo, the synchronization primitive is called "monitor" rather than mutex, and it is supported by special bytecode operations. There's a rather detailed explanation here.

You can never be sure that an object will never be used as a lock (consider reflection). Typically every object has a header with some bits dedicated to the lock. It is possible to implement it such that the header is only added as needed, but that gets a bit complicated and you probably need some header anyway (class (equivalent of "vtbl" and allocation size in C++), hash code and garbage collection).
Here's a wiki page on the implementation of synchronisation in the OpenJDK.
(In my opinion, adding a lock to every object was a mistake.)

can't JVM use compare-and-swap instruction directly? let's say each object has a field lockingThreadId storing the id of the thread that is locking it,
while( compare_and_swap (obj.lockingThreadId, null, thisThreadId) != thisTheadId )
// failed, someone else got it
mark this thread as waiting on obj.
shelf this thead
//out of loop. now this thread locked the object
do the work
obj.lockingThreadId = null;
wake up threads waiting on the obj
this is a toy model, but it doesn't seem too expensive, and does no rely on OS.

Java synchronization and performance in an aspect

I just realized that I need to synchronize a significant amount of data collection code in an aspect but performance is a real concern. If performance degrades too much my tool will be thrown out. I will be writing ints and longs individually and to various arrays, ArrayLists and Maps. There will be multiple threads of an application that will make function calls that will be picked up by my aspect. What kind of things should I look out for that will negatively affect performance? What code patterns are more efficient?
In particular I have a method that calls many other data recording methods:
void foo() {
bar();
woz();
...
}
The methods mostly do adding an incrementing of aspect fields
void bar() {
f++; // f is a field of the aspect
for (int i = 0; i < ary.length; i++) {
// get some values from aspect point cut
if (some condiction) {
ary[i] += someValue; // ary a field of the aspect
}
}
}
Should I synchronize foo, or bar, woz and others individually, or should I move all the code in bar, woz, etc into foo and just synchronize it? Should I synchronize on this, on a specifically created synchronization object:
private final Object syncObject = new Object();
(see this post), or on individual data elements within the methods:
ArrayList<Integer> a = new ArrayList<Integer>();
void bar() {
synchronize(a) {
// synchronized code
}
}

Concurrency is extremely tricky. It's very easy to get it wrong, and very hard to get right. I wouldn't be too terribly worried about performance at this point. My first and foremost concern would be to get the concurrent code to work safely (no deadlocks or race conditions).
But on the issue of performance: when in doubt, profile. It's hard to say just how different synchronization schemes will affect performance. It's even harder for us to give you suggestions. We'd need to see a lot more of your code and gain a much deeper understanding of what the application does to give you a truly useful answer. In contrast, profiling gives you hard evidence as to if one approach is slower than another. It can even help you identify where the slowdown is.
There are a lot of great profiling tools for Java these days. The Netbeans and Eclipse profilers are good.
Also, I'd recommend staying away from raw synchronization altogether. Try using some of the classes in the java.util.concurrency package. They make writing concurrent code much easier, and much less error prone.
Also, I recommend you read Java Concurrency in Practice by Brian Goetz, et al. It's very well written and covers a lot of ground.

Rule of thumb is not to synchronize on this - most of the times it is a performance hit - all methods are synchronized on one object.
Consider using locks - they'a very nice abstraction and many fine features like, trying to lock for a time period, and then giving up:
if(commandsLock.tryLock(100, TimeUnit.MILLISECONDS)){
try {
//Do something
}finally{
commandsLock.unlock();
}
}else{
//couldnt acquire lock for 100 ms
}
I second opinion on using java.util.concurrent. I'd make two levls of synchronization
synchronize collection access (if it is needed)
synchronize field access
Collection access
If your collection are read-only ie no elements get removed-inserted (but elements may change) i would say that you should use synchronized collections (but this may be not needed...) and dont synchronize iterations:
Read only:
for (int i = 0; i < ary.length; i++) {
// get some values from aspect point cut
if (some condiction) {
ary += someValue; // ary a field of the aspect
}
}
and ary is instance obtained by Collections.synchronizedList.
Read-write
synchronized(ary){
for (int i = 0; i < ary.length; i++) {
// get some values from aspect point cut
if (some condiction) {
ary += someValue; // ary a field of the aspect
}
}
}
Or use some concurrent collections (like CopyOnWriteArrayList) which is inherentently therad safe.
Main difference is that - in first read-only wersion any number of threads may iterate over this collections, and in second only one at a time may iterate. In both cases only one therad at a time should increment any given field.
Field access
Synchronize incrementations on fields separately from synchronizing iterations.
like:
Integer foo = ary.get(ii);
synchronized(foo){
foo++;
}
Get rid of synchronization
Use concurrent collections (from java.util.concurrent - not from `Collections.synchronizedXXX', latter still need synchronizing on traversal).
Use java.util.atomic that enable you to atomically incrememt fields.
Something you should watch:
Java memory model - its a talk that gives very nice understanding on how synchronizations and data aligment in JAVA works.

Upadte: since writing the below, I see you've updated the question slightly. Forgive my ignorance-- I have no idea what an "aspect" is-- but from the sample code you posted, you could also consider using atomics/concurrent collections (e.g. AtomicInteger, AtomicIntegerArray) or atomic field updaters. This could mean quite a re-factoring of your code, though. (In Java 5 on a dual-proc hyperthreading Xeon, the throughput of AtomicIntegerArray is significantly better than a synchronized array; sorry, I haven't got round to repeating the test on more procs/later JVM version yet-- note that performance of 'synchronized' has improved since then.)
Without more specific information or metrics about your particular program, the best you can do is just follow good program design. It's worth noting that the performance and optimisation of synchronization locks in the JVM has beed one of the areas (if not, the area) that has received most research and attention over the last few years. And so in the latest versions of JVM's, it ain't all that bad.
So in general, I'd say synchronize minimally without "going mad". By 'minimally', I mean so that you hold on to the lock for as less time as possible, and so that only the parts that need to use that specific lock use that specific lock. But only if the change is easy to do and it's easy to prove that your program is still correct. For example, instead of doing this:
synchronized (a) {
doSomethingWith(a);
longMethodNothingToDoWithA();
doSomethingWith(a);
}
consider doing this if and only if your program will still be correct:
synchronized (a) {
doSomethingWith(a);
}
longMethodNothingToDoWithA();
synchronized (a) {
doSomethingWith(a);
}
But remember, the odd simple field update with a lock held unnecessarily probably won't make much tangible difference, and could actually improve performance. Sometimes, holding a lock for a bit longer and doing less lock "housekeeping" can be beneficial. But the JVM can make some of those decisions, so you don't need to be tooo paranoid-- just do generally sensible things and you should be fine.
In general, try and have a separate lock for each set of methods/accesses that together form an "independent process". Other than that, having a separate lock object can be a good way of encapsulating the lock within the class it's used by (i.e. preventing it from being used by outside callers in a way you didn't predict), but there's probably no performance difference per se from using one object to another as the lock (e.g. using the instance itself vs a private Object declared just to be a lock within that class as you suggest), provided the two objects would otherwise be used in exactly the same way.

There should be a performance difference between a built-in language construct and a library, but experience has taught me not to guess when it comes to performance.

If you compile the aspect into the application then you will have basically no performance hit, if you do it at runtime (load-type weaving) then you will see a performance hit.
If you have each aspect be perinstance then it may reduce the need for synchronization.
You should have as little synchronization as possible, for as short a time as possible, to reduce any problems.
If possible you may want to share as little state as possible between threads, keeping as much local as possible, to reduce any deadlock problems.
More information would lead to a better answer btw. :)

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.