JMH State classes and shared vs unshared states

JMH State classes and shared vs unshared states - java

I'm new to jmh and to understanding what happens behind threads and so on.
So, I started reading and got stuck on the #State annotation and shared vs unshared states.
I read this example : http://hg.openjdk.java.net/code-tools/jmh/file/ecd9e76155fe/jmh-samples/src/main/java/org/openjdk/jmh/samples/JMHSample_03_States.java
and have few questions about it.
First question, what is the exact role of state classes? to hold parameters?
let's say I want to benchmark a program that encrypts a key in 2 different ways.
Should i keep the key (a String object) in a state class which annotated with a specific state? or just keep the String object on the benchmark class?
An explanation about this would be great.
Second question, why in the example above the unshared state class performance was much better than the shared one?
How does the multithreaded state changes it?
I feel really obscured since i'm new to this thing and couldn't find an "explain me like i'm 5" examples for jmh and it's options.

You can consider #State objects as the part of your benchmark that you need to run it without that the time for its creation should be considered as a part of your measured time.
Let us say that you want to measure the time it takes to compute:
#Benchmark
int benchmark() {
int foo = 1, bar = 1;
return foo + bar;
}
Unfortunately for you, the JIT compiler is too smart to let you do this and will fold the method to simply return 2. This is of course not what you want to measure. Using state, you can escape these values and let JMH take care of not letting the JIT fold its values. You would initialize values in a #Setup method.
As another use case, you can check that your benchmark did what you expected. This is possible by validating state in a #TearDown method.

First question, what is the exact role of state classes? to hold parameters? let's say I want to benchmark a program that encrypts a key in 2 different ways. Should i keep the key (a String object) in a state class which annotated with a specific state? or just keep the String object on the benchmark class? An explanation about this would be great.
The benchmark class is also a state class. See JMHSample_04_DefaultState.java.
Second question, why in the example above the unshared state class performance was much better than the shared one? How does the multithreaded state changes it?
This is an issue of “modern” processors and not of JMH. Each core on the processor has its own L1 (and maybe L2) cache. They usually don't access the RAM directly. If multiple threads are constantly writing to the same area of memory, the processor is constantly busy to synchronize the data between all cores. You don't actually have to access the same variable to get this effect. See JMHSample_22_FalseSharing.java.

The two main benefits of using a state class are:
You guarantee the processes for data preparations are not measured in the benchmark.
You can clearly define the scope of the state objects, therefore having higher control over and knowledge of what is truly being benchmarked.
This article explains the State class and state objects with more detail:
https://www.oracle.com/technical-resources/articles/java/architect-benchmarking.html

Related

Multi-thread state visibility in Java: is there a way to turn the JVM into the worst case scenario?

Suppose our code has 2 threads (A and B) have a reference to the same instance of this class somewhere:
public class MyValueHolder {
private int value = 1;
// ... getter and setter
}
When Thread A does myValueHolder.setValue(7), there is no guarantee that Thread B will ever read that value: myValueHolder.getValue() could - in theory - keep returning 1 forever.
In practice however, the hardware will clear the second level cache sooner or later, so Thread B will read 7 sooner or later (usually sooner).
Is there any way to make the JVM emulate that worst case scenario for which it keeps returning 1 forever for Thread B? That would be very useful to test our multi-threaded code with our existing tests under those circumstances.

jcstress maintainer here. There are multiple ways to answer that question.
The easiest solution would be wrapping the getter in the loop, and let JIT hoist it. This is allowed for non-volatile field reads, and simulates the visibility failure with compiler optimization.
More sophisticated trick involves getting the debug build of OpenJDK, and using -XX:+StressLCM -XX:+StressGCM, effectively doing the instruction scheduling fuzzing. Chances are the load in question will float somewhere you can detect with the regular tests your product has.
I am not sure if there is practical hardware holding the written value long enough opaque to cache coherency, but it is somewhat easy to build the testcase with jcstress. You have to keep in mind that the optimization in (1) can also happen, so we need to employ a trick to prevent that. I think something like this should work.

It would be great to have a Java compiler that would intentionally perform as many weird (but allowed) transfirmations as possible to be able to break thread unsafe code more easily, like Csmith for C. Unfortunately, such a compiler does not exist (as far as I know).
In the meantime, you can try the jcstress library* and exercise your code on several architectures, if possible with weaker memory models (i.e. not x86) to try and break your code:
The Java Concurrency Stress tests (jcstress) is an experimental harness and a suite of tests aid research in the correctness of concurrency support in the JVM, class libraries, and hardware.
But in the end, unfortunately, the only way to prove that a piece of code is 100% correct is code inspection (and I don't know of a static code analysis tool able to detect all race conditions).
*I have not used it and I am unclear which of jcstress and the java-concurrency-torture library is more up to date (I would suspect jcstress).

Not on a real machine, sadly testing multi-threaded code will remain difficult.
As you say, the hardware will clear the second level cache and the JVM has no control over that. The JSL only specifies what must happen and this is a case where B might never see the updated value of value.
The only way to force this to happen on a real machine is to alter the code in such a way to void your testing strategy i.e. you end up testing different code.
However, you might be able to run this on a simulator that simulates hardware that doesn't clear the second level cache. Sounds like a lot of effort though!

I think you are refering to the principle called "false sharing" where different CPUs must synchronize their caches or else face the possibility that data such as you describe could become mismatched. There is a very good article on false sharing on Intel's website. Intel describes some useful tools in their article for diagnosing this problem. This is a relevant quote:
The primary means of avoiding false sharing is through code
inspection. Instances where threads access global or dynamically
allocated shared data structures are potential sources of false
sharing. Note that false sharing can be obscured by the fact that
threads may be accessing completely different global variables that
happen to be relatively close together in memory. Thread-local storage
or local variables can be ruled out as sources of false sharing.
Although methods described in the article are not what you have asked for (forcing worst-case behavior from the JVM), as already stated this isn't really possible. The methods described in this article are the best way I know to try to diagnose and avoid false sharing.
There are other resources addressing this problem around the web. For example, this article has a suggestion for a way to avoid false sharing in Java. I have not tried this method, so I cannot vouch for it, but I think the author's idea is sound. You might consider trying out his suggestion.

I have previously suggested a worst case behaving JVM for test purposes on the memory model list but the idea didn't seem popular.
So how to gain "worst case JVM behaviour" , with existing technology i.e how can I test the scenario in the question and get it to fail EVERY time. You could try to find the setup with the weakest memory model possible but that's unlikely to be perfect.
What I have often considered is using a distributed JVM something similar to how I believe Terracotta works under the cover so your application now runs on multiple JVM's (either remote or local) (threads in the same application run in different instances). In this setup inter JVM thread communication takes place at memory barriers e.g. the synchronized keywords you are missing in bugged code for instance (it conforms to the Java Memory Model) and the application is configured i.e. you say this class thread runs here . No code change required to your tests just configuration, any well ordered java application should run out of the box, however this setup would be very intolerant of a badly ordered application (normally a problem ... now an asset i.e. the Memory model exhibits very weak but legal behavior). In the example above loading the code onto a cluster, if two threads run on different nodes setValue has no effect visible to the other thread unless the code was changed and synchronized, volatile etc etc were used, then the code works as intended.
Now your test for the example above (configured correctly) would fail every time without correct "happens before ordering" which is potentially very useful for tests. The flaw in the plan for complete coverage you would need a potentially a node per application thread (can be same machine or multiple in a cluster) or multiple test runs. If you have 1000's of threads then that could be prohibitive though hopefully they would be pooled and scaled down for E2E test scenarios or run it in a cloud. If nothing else this kind of setup might be useful in demonstrating the issue.
inter thread communication across JVMs

The example you have given is described as Incorrectly Synchronized in http://docs.oracle.com/javase/specs/jls/se7/html/jls-17.html#jls-17.4. I think this is always incorrect and will lead to bugs sooner or later. Most of the times later :-).
To find such incorrectly synchronized code blocks, I use the following algorithm:
Record the threads for all field modifications using instrumentation. If a field is modified by more than one thread without synchronization, I have found a data race.
I implemented this algorithm inside http://vmlens.com, which is a tool to find data races inside java programs.

Here's a simple way: just comment out the code for setValue. You can uncomment it after testing. Since in many cases like this a mechanism is needed to fake failures, it would be a good idea to build a general mechanism for all such cases.

Double-Checked Locking - does it work in Java on earth?

all:
Here is the famous article:
The "Double-Checked Locking is Broken" Declaration
It declares that pattern doesn't work in Java. It further says, close to the end, that new JVM can make the pattern work by using volatile.
However, in another article: Memory Barriers and JVM Concurrency
It says keyword "synchronized" generates memory barrier full fences. So who is right? Does the pattern work in Java on earth?

There are essentially 3 ways to fix double-checked locking:
ensure that the variable is declared volatile (works from Java 5 onwards);
just don't bother with it in the first place: just use synchronization and don't try to mess around with fancy bug-prone-- and probably pointless-- means of "avoiding" it;
let the classloader do the synchronization for you.
I've posted example code here.
BUT: Double-checked locking is really an outdated paradigm, if indeed it was ever useful in Java. As I see things, it was essentially carried over into Java by C programmers who didn't fully appreciate that the JVM effectively has a more efficient (and correct!) way of dealing with the issue built into the classloader and that optimisations to synchronization are generally best made at the JVM level.
I've seen a lot of people clutter their code with this "pattern". I don't think I've ever seen any actual data showing that it has any benefit.
Plus: if you do have a large application that is hitting synchronization issues, then one of the whole raisons d'être of Java is that it has rich concurrency libraries. Look at how you can re-work your application to use them... if profiling data proves it to be necessary.

It depends on what version of java you are using.
This has been fixed in java 5 and forward.
Check http://en.wikipedia.org/wiki/Double-checked_locking#Usage_in_Java

They're both right, and DCL works fine in Java from 5 on.
If you are expecting your program to produce the exact same output every time given the exact same input, and you are using DCL, you may want to seriously rethink what you are doing. An awful lot can depend on who gets to the lock first--you're rolling a lot of dice. Not good for an accounting app.
If your program involves balls bouncing off walls and each other, DCL may make a lot of sense. It does work. Synchronizing has to be a bit slower than non-synchronizing even without contention, so why do it if a simple if can prevent it? And if 100 threads pile up on a synch statement when the needed object already exists, that has to be a lot slower.

The keyword "synchronized" that generates memory barrier full fences does not mean DCL could work properly. Let's take the following code as example:
public static Runnable getInstance()
{
if (null == instance) //1
{
synchronized (Runnable.class)
{
if (null == instance)
{
instance = new Runnable(); //2
}
}
}
return instance;
}
We know that JVM will follow many steps when construct an object. We focus 2 important steps here:
First, JVM malloc the memory for this object. The value of member-variables in this object has defaut value for now. Second, JVM calls method and assigns the user-specified value to the member variables.
That means thread A may get a partitially-constructed instance in code 1 (in the middle of the code 1 and code 2) . Although "synchronized" generates memory barrier full fences, there is no happen-before guarantee in code 1 and code 2. Memory barrier fences take effect during synchronized code block. Code 1 is outside the synchronized code block.

Why do all Java Objects have wait() and notify() and does this cause a performance hit?

Every Java Object has the methods wait() and notify() (and additional variants). I have never used these and I suspect many others haven't. Why are these so fundamental that every object has to have them and is there a performance hit in having them (presumably some state is stored in them)?
EDIT to emphasize the question. If I have a List<Double> with 100,000 elements then every Double has these methods as it is extended from Object. But it seems unlikely that all of these have to know about the threads that manage the List.
EDIT excellent and useful answers. #Jon has a very good blog post which crystallised my gut feelings. I also agree completely with #Bob_Cross that you should show a performance problem before worrying about it. (Also as the nth law of successful languages if it had been a performance hit then Sun or someone would have fixed it).

Well, it does mean that every object has to potentially have a monitor associated with it. The same monitor is used for synchronized. If you agree with the decision to be able to synchronize on any object, then wait() and notify() don't add any more per-object state. The JVM may allocate the actual monitor lazily (I know .NET does) but there has to be some storage space available to say which monitor is associated with the object. Admittedly it's possible that this is a very small amount (e.g. 3 bytes) which wouldn't actually save any memory anyway due to padding of the rest of the object overhead - you'd have to look at how each individual JVM handled memory to say for sure.
Note that just having extra methods doesn't affect performance (other than very slightly due to the code obvious being present somewhere). It's not like each object or even each type has its own copy of the code for wait() and notify(). Depending on how the vtables work, each type may end up with an extra vtable entry for each inherited method - but that's still only on a per type basis, not a per object basis. That's basically going to get lost in the noise compared with the bulk of the storage which is for the actual objects themselves.
Personally, I feel that both .NET and Java made a mistake by associating a monitor with every object - I'd rather have explicit synchronization objects instead. I wrote a bit more on this in a blog post about redesigning java.lang.Object/System.Object.

Why are these so fundamental that
every object has to have them and is
there a performance hit in having them
(presumably some state is stored in
them)?
tl;dr: They are thread-safety methods and they have small costs relative to their value.
The fundamental realities that these methods support are that:
Java is always multi-threaded. Example: check out the list of Threads used by a process using jconsole or jvisualvm some time.
Correctness is more important than "performance." When I was grading projects (many years ago), I used to have to explain "getting to the wrong answer really fast is still wrong."
Fundamentally, these methods provide some of the hooks to manage per-Object monitors used in synchronization. Specifically, if I have synchronized(objectWithMonitor) in a particular method, I can use objectWithMonitor.wait() to yield that monitor (e.g., if I need another method to complete a computation before I can proceed). In that case, that will allow one other method that was blocked waiting for that monitor to proceed.
On the other hand, I can use objectWithMonitor.notifyAll() to let Threads that are waiting for the monitor know that I am going to be relinquishing the monitor soon. They can't actually proceed until I leave the synchronized block, though.
With respect to specific examples (e.g., long Lists of Doubles) where you might worry that there's a performance or memory hit on the monitoring mechanism, here are some points that you should likely consider:
First, prove it. If you think there is a major impact from a core Java mechanism such as multi-threaded correctness, there's an excellent chance that your intuition is false. Measure the impact first. If it's serious and you know that you'll never need to synchronize on an individual Double, consider using doubles instead.
If you aren't certain that you, your co-worker, a future maintenance coder (who might be yourself a year later), etc., will never ever ever need a fine granularity of theaded access to your data, there's an excellent chance that taking these monitors away would only make your code less flexible and maintainable.
Follow-up in response to the question on per-Object vs. explicit monitor objects:
Short answer: #JonSkeet: yes, removing the monitors would create problems: it would create friction. Keeping those monitors in Object reminds us that this is always a multithreaded system.
The built-in object monitors are not sophisticated but they are: easy to explain; work in a predictable fashion; and are clear in their purpose. synchronized(this) is a clear statement of intent. If we force novice coders to use the concurrency package exclusively, we introduce friction. What's in that package? What's a semaphore? Fork-join?
A novice coder can use the Object monitors to write decent model-view-controller code. synchronized, wait and notifyAll can be used to implement naive (in the sense of simple, accessible but perhaps not bleeding-edge performance) thread-safety. The canonical example would be one of these Doubles (posited by the OP) which can have one Thread set a value while the AWT thread gets the value to put it on a JLabel. In that case, there is no good reason to create an explicit additional Object just to have an external monitor.
At a slightly higher level of complexity, these same methods are useful as an external monitoring method. In the example above, I explicitly did that (see objectWithMonitor fragments above). Again, these methods are really handy for putting together relatively simple thread safety.
If you would like to be even more sophisticated, I think you should seriously think about reading Java Concurrency In Practice (if you haven't already). Read and write locks are very powerful without adding too much additional complexity.
Punchline: Using basic synchronization methods, you can exploit a large portion of the performance enabled by modern multi-core processors with thread-safety and without a lot of overhead.

All objects in Java have monitors associated with them. Synchronization primitives are useful in pretty much all multi-threaded code, and its semantically very nice to synchronize on the object(s) you are accessing rather than on separate "Monitor" objects.
Java may allocate the Monitors associated with the objects as needed - as .NET does - and in any case the actual overhead for simply allocating (but not using) the lock would be quite small.
In short: its really convenient to store Objects with their thread safety support bits, and there is very little performance impact.

These methods are around to implement inter-thread communication.
Check this article on the subject.
Rules for those methods, taken from that article:
wait( ) tells the calling thread to give up the monitor and go to sleep until some other
thread enters the same monitor and calls notify( ).
notify( ) wakes up the first thread that called wait( ) on the same object.
notifyAll( ) wakes up all the threads that called wait( ) on the same object. The
highest priority thread will run first.
Hope this helps...

Java Multi-Threading Beginner Questions

I am working on a scientific application that has readily separable parts that can proceed in parallel. So, I've written those parts to each run as independent threads, though not for what appears to be the standard reason for separating things into threads (i.e., not blocking some quit command or the like).
A few questions:
Does this actually buy me anything on standard multi-core desktops - i.e., will the threads actually run on the separate cores if I have a current JVM, or do I have to do something else?
I have few objects which are read (though never written) by all the threads. Potential problems with that? Solutions to those problems?
For actual clusters, can you recommend frameworks to distribute the threads to the various nodes so that I don't have to manage that myself (well, if such exist)? CLARIFICATION: by this, I mean either something that automatically converts threads into task for individual nodes or makes the entire cluster look like a single JVM (i.e., so it could send threads to whatever processors it can access) or whatever. Basically, implement the parallelization in a useful way on a cluster, given that I've built it into the algorithm, with the minimal job husbandry on my part.
Bonus: Most of the evaluation consists of set comparisons (e.g., union, intersection, contains) with some mapping from keys to get the pertinent sets. I have some limited experience with FORTRAN, C, and C++ (semester of scientific computing for the first, and HS AP classes 10 years ago for the other two) - what sort of speed/ease of parallelization gains might I find if I tied my Java front-end to an algorithmic back-end in one of those languages, and what sort of headache might my level of experience find implementing those operations in those languages?

Yes, using independent threads will use multiple cores in a normal JVM, without you having to do any work.
If anything is only ever read, it should be fine to be read by multiple threads. If you can make the objects in question immutable (to guarantee they'll never be changed) that's even better
I'm not sure what sort of clustering you're considering, but you might want to look at Hadoop. Note that distributed computing distributes tasks rather than threads (normally, anyway).

Multi-core Usage
Java runtimes conventionally schedule threads to run concurrently on all available processors and cores. I think it's possible to restrict this, but it would take extra work; by default, there is no restriction.
Immutable Objects
For read-only objects, declare their member fields as final, which will ensure that they are assigned when the object is created and never changed. If a field is not final, even if it never changed after construction, there can be some "visibility" issues in a multi-threaded program. This could result in the assignments made by one thread never becoming visible to another.
Any mutable fields that are accessed by multiple threads should be declared volatile, be protected by synchronization, or use some other concurrency mechanism to ensure that changes are consistent and visible among threads.
Distributed Computing
The most widely used framework for distributed processing of this nature in Java is called Hadoop. It uses a paradigm called map-reduce.
Native Code Integration
Integrating with other languages is unlikely to be worthwhile. Because of its adaptive bytecode-to-native compiler, Java is already extremely fast on a wide range of computing tasks. It would be wrong to assume that another language is faster without actual testing. Also, integrating with "native" code using JNI is extremely tedious, error-prone, and complicated; using simpler interfaces like JNA is very slow and would quickly erase any performance gains.

As some people have said, the answers are:
Threads on cores - Yes. Java has had support for native threads for a long time. Most OSes have provided kernel threads which automagically get scheduled to any CPUs you have (implementation performance may vary by OS).
The simple answer is it will be safe in general. The more complex answer is that you have to ensure that your Object is actually created & initialized before any threads can access it. This is solved one of two ways:
Let the class loader solve the problem for you using a Singleton (and lazy class loading):
public class MyImmutableObject
{
private static class MyImmutableObjectInstance {
private static final MyImmutableObject instance = new MyImmutableObject();
}
public MyImmutableObject getInstance() {
return MyImmutableObjectInstance.instance;
}
}
Explicitly using acquire/release semantics to ensure a consistent memory model:
MyImmutableObject foo = null;
volatile bool objectReady = false;
// initializer thread:
....
/// create & initialize object for use by multiple threads
foo = new MyImmutableObject();
foo.initialize();
// release barrier
objectReady = true;
// start worker threads
public void run() {
// acquire barrier
if (!objectReady)
throw new IllegalStateException("Memory model violation");
// start using immutable object foo
}
I don't recall off the top of my head how you can exploit the memory model of Java to perform the latter case. I believe, if I remember correctly, that a write to a volatile variable is equivalent to a release barrier, while a read from a volatile variable is equivalent to an acquire barrier. Also, the reason for making the boolean volatile as opposed to the object is that access of a volatile variable is more expensive due to the memory model constraints - thus, the boolean allows you to enforce the memory model & then the object access can be done much faster within the thread.
As mentioned, there's all sorts of RPC mechanisms. There's also RMI which is a native approach for running code on remote targets. There's also frameworks like Hadoop which offer a more complete solution which might be more appropriate.
For calling native code, it's pretty ugly - Sun really discourages use by making JNI an ugly complicated mess, but it is possible. I know that there was at least one commercial Java framework for loading & executing native dynamic libraries without needing to worry about JNI (not sure if there are any free or OSS projects).
Good luck.

Java synchronization and performance in an aspect

I just realized that I need to synchronize a significant amount of data collection code in an aspect but performance is a real concern. If performance degrades too much my tool will be thrown out. I will be writing ints and longs individually and to various arrays, ArrayLists and Maps. There will be multiple threads of an application that will make function calls that will be picked up by my aspect. What kind of things should I look out for that will negatively affect performance? What code patterns are more efficient?
In particular I have a method that calls many other data recording methods:
void foo() {
bar();
woz();
...
}
The methods mostly do adding an incrementing of aspect fields
void bar() {
f++; // f is a field of the aspect
for (int i = 0; i < ary.length; i++) {
// get some values from aspect point cut
if (some condiction) {
ary[i] += someValue; // ary a field of the aspect
}
}
}
Should I synchronize foo, or bar, woz and others individually, or should I move all the code in bar, woz, etc into foo and just synchronize it? Should I synchronize on this, on a specifically created synchronization object:
private final Object syncObject = new Object();
(see this post), or on individual data elements within the methods:
ArrayList<Integer> a = new ArrayList<Integer>();
void bar() {
synchronize(a) {
// synchronized code
}
}

Concurrency is extremely tricky. It's very easy to get it wrong, and very hard to get right. I wouldn't be too terribly worried about performance at this point. My first and foremost concern would be to get the concurrent code to work safely (no deadlocks or race conditions).
But on the issue of performance: when in doubt, profile. It's hard to say just how different synchronization schemes will affect performance. It's even harder for us to give you suggestions. We'd need to see a lot more of your code and gain a much deeper understanding of what the application does to give you a truly useful answer. In contrast, profiling gives you hard evidence as to if one approach is slower than another. It can even help you identify where the slowdown is.
There are a lot of great profiling tools for Java these days. The Netbeans and Eclipse profilers are good.
Also, I'd recommend staying away from raw synchronization altogether. Try using some of the classes in the java.util.concurrency package. They make writing concurrent code much easier, and much less error prone.
Also, I recommend you read Java Concurrency in Practice by Brian Goetz, et al. It's very well written and covers a lot of ground.

Rule of thumb is not to synchronize on this - most of the times it is a performance hit - all methods are synchronized on one object.
Consider using locks - they'a very nice abstraction and many fine features like, trying to lock for a time period, and then giving up:
if(commandsLock.tryLock(100, TimeUnit.MILLISECONDS)){
try {
//Do something
}finally{
commandsLock.unlock();
}
}else{
//couldnt acquire lock for 100 ms
}
I second opinion on using java.util.concurrent. I'd make two levls of synchronization
synchronize collection access (if it is needed)
synchronize field access
Collection access
If your collection are read-only ie no elements get removed-inserted (but elements may change) i would say that you should use synchronized collections (but this may be not needed...) and dont synchronize iterations:
Read only:
for (int i = 0; i < ary.length; i++) {
// get some values from aspect point cut
if (some condiction) {
ary += someValue; // ary a field of the aspect
}
}
and ary is instance obtained by Collections.synchronizedList.
Read-write
synchronized(ary){
for (int i = 0; i < ary.length; i++) {
// get some values from aspect point cut
if (some condiction) {
ary += someValue; // ary a field of the aspect
}
}
}
Or use some concurrent collections (like CopyOnWriteArrayList) which is inherentently therad safe.
Main difference is that - in first read-only wersion any number of threads may iterate over this collections, and in second only one at a time may iterate. In both cases only one therad at a time should increment any given field.
Field access
Synchronize incrementations on fields separately from synchronizing iterations.
like:
Integer foo = ary.get(ii);
synchronized(foo){
foo++;
}
Get rid of synchronization
Use concurrent collections (from java.util.concurrent - not from `Collections.synchronizedXXX', latter still need synchronizing on traversal).
Use java.util.atomic that enable you to atomically incrememt fields.
Something you should watch:
Java memory model - its a talk that gives very nice understanding on how synchronizations and data aligment in JAVA works.

Upadte: since writing the below, I see you've updated the question slightly. Forgive my ignorance-- I have no idea what an "aspect" is-- but from the sample code you posted, you could also consider using atomics/concurrent collections (e.g. AtomicInteger, AtomicIntegerArray) or atomic field updaters. This could mean quite a re-factoring of your code, though. (In Java 5 on a dual-proc hyperthreading Xeon, the throughput of AtomicIntegerArray is significantly better than a synchronized array; sorry, I haven't got round to repeating the test on more procs/later JVM version yet-- note that performance of 'synchronized' has improved since then.)
Without more specific information or metrics about your particular program, the best you can do is just follow good program design. It's worth noting that the performance and optimisation of synchronization locks in the JVM has beed one of the areas (if not, the area) that has received most research and attention over the last few years. And so in the latest versions of JVM's, it ain't all that bad.
So in general, I'd say synchronize minimally without "going mad". By 'minimally', I mean so that you hold on to the lock for as less time as possible, and so that only the parts that need to use that specific lock use that specific lock. But only if the change is easy to do and it's easy to prove that your program is still correct. For example, instead of doing this:
synchronized (a) {
doSomethingWith(a);
longMethodNothingToDoWithA();
doSomethingWith(a);
}
consider doing this if and only if your program will still be correct:
synchronized (a) {
doSomethingWith(a);
}
longMethodNothingToDoWithA();
synchronized (a) {
doSomethingWith(a);
}
But remember, the odd simple field update with a lock held unnecessarily probably won't make much tangible difference, and could actually improve performance. Sometimes, holding a lock for a bit longer and doing less lock "housekeeping" can be beneficial. But the JVM can make some of those decisions, so you don't need to be tooo paranoid-- just do generally sensible things and you should be fine.
In general, try and have a separate lock for each set of methods/accesses that together form an "independent process". Other than that, having a separate lock object can be a good way of encapsulating the lock within the class it's used by (i.e. preventing it from being used by outside callers in a way you didn't predict), but there's probably no performance difference per se from using one object to another as the lock (e.g. using the instance itself vs a private Object declared just to be a lock within that class as you suggest), provided the two objects would otherwise be used in exactly the same way.

There should be a performance difference between a built-in language construct and a library, but experience has taught me not to guess when it comes to performance.

If you compile the aspect into the application then you will have basically no performance hit, if you do it at runtime (load-type weaving) then you will see a performance hit.
If you have each aspect be perinstance then it may reduce the need for synchronization.
You should have as little synchronization as possible, for as short a time as possible, to reduce any problems.
If possible you may want to share as little state as possible between threads, keeping as much local as possible, to reduce any deadlock problems.
More information would lead to a better answer btw. :)

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.