Java synchronization and performance in an aspect

Java synchronization and performance in an aspect - java

I just realized that I need to synchronize a significant amount of data collection code in an aspect but performance is a real concern. If performance degrades too much my tool will be thrown out. I will be writing ints and longs individually and to various arrays, ArrayLists and Maps. There will be multiple threads of an application that will make function calls that will be picked up by my aspect. What kind of things should I look out for that will negatively affect performance? What code patterns are more efficient?
In particular I have a method that calls many other data recording methods:
void foo() {
bar();
woz();
...
}
The methods mostly do adding an incrementing of aspect fields
void bar() {
f++; // f is a field of the aspect
for (int i = 0; i < ary.length; i++) {
// get some values from aspect point cut
if (some condiction) {
ary[i] += someValue; // ary a field of the aspect
}
}
}
Should I synchronize foo, or bar, woz and others individually, or should I move all the code in bar, woz, etc into foo and just synchronize it? Should I synchronize on this, on a specifically created synchronization object:
private final Object syncObject = new Object();
(see this post), or on individual data elements within the methods:
ArrayList<Integer> a = new ArrayList<Integer>();
void bar() {
synchronize(a) {
// synchronized code
}
}

Concurrency is extremely tricky. It's very easy to get it wrong, and very hard to get right. I wouldn't be too terribly worried about performance at this point. My first and foremost concern would be to get the concurrent code to work safely (no deadlocks or race conditions).
But on the issue of performance: when in doubt, profile. It's hard to say just how different synchronization schemes will affect performance. It's even harder for us to give you suggestions. We'd need to see a lot more of your code and gain a much deeper understanding of what the application does to give you a truly useful answer. In contrast, profiling gives you hard evidence as to if one approach is slower than another. It can even help you identify where the slowdown is.
There are a lot of great profiling tools for Java these days. The Netbeans and Eclipse profilers are good.
Also, I'd recommend staying away from raw synchronization altogether. Try using some of the classes in the java.util.concurrency package. They make writing concurrent code much easier, and much less error prone.
Also, I recommend you read Java Concurrency in Practice by Brian Goetz, et al. It's very well written and covers a lot of ground.

Rule of thumb is not to synchronize on this - most of the times it is a performance hit - all methods are synchronized on one object.
Consider using locks - they'a very nice abstraction and many fine features like, trying to lock for a time period, and then giving up:
if(commandsLock.tryLock(100, TimeUnit.MILLISECONDS)){
try {
//Do something
}finally{
commandsLock.unlock();
}
}else{
//couldnt acquire lock for 100 ms
}
I second opinion on using java.util.concurrent. I'd make two levls of synchronization
synchronize collection access (if it is needed)
synchronize field access
Collection access
If your collection are read-only ie no elements get removed-inserted (but elements may change) i would say that you should use synchronized collections (but this may be not needed...) and dont synchronize iterations:
Read only:
for (int i = 0; i < ary.length; i++) {
// get some values from aspect point cut
if (some condiction) {
ary += someValue; // ary a field of the aspect
}
}
and ary is instance obtained by Collections.synchronizedList.
Read-write
synchronized(ary){
for (int i = 0; i < ary.length; i++) {
// get some values from aspect point cut
if (some condiction) {
ary += someValue; // ary a field of the aspect
}
}
}
Or use some concurrent collections (like CopyOnWriteArrayList) which is inherentently therad safe.
Main difference is that - in first read-only wersion any number of threads may iterate over this collections, and in second only one at a time may iterate. In both cases only one therad at a time should increment any given field.
Field access
Synchronize incrementations on fields separately from synchronizing iterations.
like:
Integer foo = ary.get(ii);
synchronized(foo){
foo++;
}
Get rid of synchronization
Use concurrent collections (from java.util.concurrent - not from `Collections.synchronizedXXX', latter still need synchronizing on traversal).
Use java.util.atomic that enable you to atomically incrememt fields.
Something you should watch:
Java memory model - its a talk that gives very nice understanding on how synchronizations and data aligment in JAVA works.

Upadte: since writing the below, I see you've updated the question slightly. Forgive my ignorance-- I have no idea what an "aspect" is-- but from the sample code you posted, you could also consider using atomics/concurrent collections (e.g. AtomicInteger, AtomicIntegerArray) or atomic field updaters. This could mean quite a re-factoring of your code, though. (In Java 5 on a dual-proc hyperthreading Xeon, the throughput of AtomicIntegerArray is significantly better than a synchronized array; sorry, I haven't got round to repeating the test on more procs/later JVM version yet-- note that performance of 'synchronized' has improved since then.)
Without more specific information or metrics about your particular program, the best you can do is just follow good program design. It's worth noting that the performance and optimisation of synchronization locks in the JVM has beed one of the areas (if not, the area) that has received most research and attention over the last few years. And so in the latest versions of JVM's, it ain't all that bad.
So in general, I'd say synchronize minimally without "going mad". By 'minimally', I mean so that you hold on to the lock for as less time as possible, and so that only the parts that need to use that specific lock use that specific lock. But only if the change is easy to do and it's easy to prove that your program is still correct. For example, instead of doing this:
synchronized (a) {
doSomethingWith(a);
longMethodNothingToDoWithA();
doSomethingWith(a);
}
consider doing this if and only if your program will still be correct:
synchronized (a) {
doSomethingWith(a);
}
longMethodNothingToDoWithA();
synchronized (a) {
doSomethingWith(a);
}
But remember, the odd simple field update with a lock held unnecessarily probably won't make much tangible difference, and could actually improve performance. Sometimes, holding a lock for a bit longer and doing less lock "housekeeping" can be beneficial. But the JVM can make some of those decisions, so you don't need to be tooo paranoid-- just do generally sensible things and you should be fine.
In general, try and have a separate lock for each set of methods/accesses that together form an "independent process". Other than that, having a separate lock object can be a good way of encapsulating the lock within the class it's used by (i.e. preventing it from being used by outside callers in a way you didn't predict), but there's probably no performance difference per se from using one object to another as the lock (e.g. using the instance itself vs a private Object declared just to be a lock within that class as you suggest), provided the two objects would otherwise be used in exactly the same way.

There should be a performance difference between a built-in language construct and a library, but experience has taught me not to guess when it comes to performance.

If you compile the aspect into the application then you will have basically no performance hit, if you do it at runtime (load-type weaving) then you will see a performance hit.
If you have each aspect be perinstance then it may reduce the need for synchronization.
You should have as little synchronization as possible, for as short a time as possible, to reduce any problems.
If possible you may want to share as little state as possible between threads, keeping as much local as possible, to reduce any deadlock problems.
More information would lead to a better answer btw. :)

Related

JMH State classes and shared vs unshared states

I'm new to jmh and to understanding what happens behind threads and so on.
So, I started reading and got stuck on the #State annotation and shared vs unshared states.
I read this example : http://hg.openjdk.java.net/code-tools/jmh/file/ecd9e76155fe/jmh-samples/src/main/java/org/openjdk/jmh/samples/JMHSample_03_States.java
and have few questions about it.
First question, what is the exact role of state classes? to hold parameters?
let's say I want to benchmark a program that encrypts a key in 2 different ways.
Should i keep the key (a String object) in a state class which annotated with a specific state? or just keep the String object on the benchmark class?
An explanation about this would be great.
Second question, why in the example above the unshared state class performance was much better than the shared one?
How does the multithreaded state changes it?
I feel really obscured since i'm new to this thing and couldn't find an "explain me like i'm 5" examples for jmh and it's options.

You can consider #State objects as the part of your benchmark that you need to run it without that the time for its creation should be considered as a part of your measured time.
Let us say that you want to measure the time it takes to compute:
#Benchmark
int benchmark() {
int foo = 1, bar = 1;
return foo + bar;
}
Unfortunately for you, the JIT compiler is too smart to let you do this and will fold the method to simply return 2. This is of course not what you want to measure. Using state, you can escape these values and let JMH take care of not letting the JIT fold its values. You would initialize values in a #Setup method.
As another use case, you can check that your benchmark did what you expected. This is possible by validating state in a #TearDown method.

First question, what is the exact role of state classes? to hold parameters? let's say I want to benchmark a program that encrypts a key in 2 different ways. Should i keep the key (a String object) in a state class which annotated with a specific state? or just keep the String object on the benchmark class? An explanation about this would be great.
The benchmark class is also a state class. See JMHSample_04_DefaultState.java.
Second question, why in the example above the unshared state class performance was much better than the shared one? How does the multithreaded state changes it?
This is an issue of “modern” processors and not of JMH. Each core on the processor has its own L1 (and maybe L2) cache. They usually don't access the RAM directly. If multiple threads are constantly writing to the same area of memory, the processor is constantly busy to synchronize the data between all cores. You don't actually have to access the same variable to get this effect. See JMHSample_22_FalseSharing.java.

The two main benefits of using a state class are:
You guarantee the processes for data preparations are not measured in the benchmark.
You can clearly define the scope of the state objects, therefore having higher control over and knowledge of what is truly being benchmarked.
This article explains the State class and state objects with more detail:
https://www.oracle.com/technical-resources/articles/java/architect-benchmarking.html

How does the JVM internally handle race conditions?

If multiple threads try to update the same member variable, it is called a race condition. But I was more interested in knowing how the JVM handles it internally if we don't handle it in our code by making it synchronised or something else? Will it hang my program? How will the JVM react to it? I thought the JVM would temporarily create a sync block for this situation, but I'm not sure what exactly would be happening.
If any of you have some insight, it would be good to know.

The precise term is a data race, which is a specialization of the general concept of a race condition. The term data race is an official, precisely specified concept, which means that it arises from a formal analysis of the code.
The only way to get the real picture is to go and study the Memory Model chapter of the Java Language Specification, but this is a simplified view: whenever you have a data race, there is almost no guarantee as to the outcome and a reading thread may see any value which has ever been written to the variable. Therein also lies the only guarantee: the thread will not observe an "out-of-thin-air" value, such which was never written. Well, unless you're dealing with longs or doubles, then you may see torn writes.

Maybe I'm missing something but what is there to handle? There is still a thread that will get there first. Depending on which thread that is, that thread will just update/read some variable and proceed to the next instruction. It can't magically construct a sync block, it doesn't really know what you want to do. So in other words what happens will depend on the outcome of the 'race'.
Note I'm not heavily into the lower level stuff so perhaps I don't fully understand the depth of your question.

Java provides synchronized and volatile to deal with these situations. Using them properly can be frustratingly difficult, but keep in mind that Java is only exposing the complexity of modern CPU and memory architectures. The alternatives would be to always err on the side of caution, effectively synchronizing everything which would kill performance; or ignore the problem and offer no thread safety whatsoever. And fortunately, Java provides excellent high-level constructs in the java.util.concurrent package, so you can often avoid dealing with the low-level stuff.

In short, the JVM assumes that code is free of data races when translating it into machine code. That is, if code is not correctly synchronized, the Java Language Specification provides only limited guarantees about the behavior of that code.
Most modern hardware likewise assumes that code is free of data races when executing it. That is, if code is not correctly synchronized, the hardware makes only limited guarantees about the result of its execution.
In particular, the Java Language Specification guarantees the following only in the absence of a data race:
visibility: reading a field yields the value last assigned to it (it is unclear which write was last, and writes of long or double variables need not be atomic)
ordering: if a write is visible, so are any writes preceding it. For instance, if one thread executes:
x = new FancyObject();
another thread can read x only after the constructor of FancyObject has executed completely.
In the presence of a data race, these guarantees are null and void. It is possible for a reading thread to never see a write. It is also possible to see the write of x, without seeing the effect of the constructor that logically preceded the write of x. It is very unlikely that the program is correct if such basic assumptions can not be made.
A data race will however not compromise the integrity of the Java Virtual Machine. In particular, the JVM will not crash or halt, and still guarantee memory safety (i.e. prevent memory corruption) and certain semantics of final fields.

The JVM will handle the situation just fine (ie it will not hang or complain), but you may not get a result that you like!
When multiple threads are involved, java becomes fiendishly complicated and even code that looks obviously correct can turn out to be horribly broken. As an example:
public class IntCounter {
private int i;
public IntCounter(int i){
this.i = i;
}
public void incrementInt(){
i++;
}
public int getInt(){
return i;
}
}
is flawed in many ways.
First, let's say that i is currently 0 and thread A and thread B both call incrementInt() at about the same time. There is a danger that they will both see that i is 0, then both increment it 1 and then save the result. So at the end of the two calls, i is only 1, not 2!
That's the race condition problem with the code, but there are other problems concerning memory visibility. When thread A changes a shared variable, there is no guarantee (without synchronization) that thread B will ever see the changes!
So thread A could increment i 100 times, and an hour later, thread B, calling getInt(), might see i as 0, or 100 or anywhere in between!
The only sane thing to do if you are delving into java concurrency is to read Java Concurrency in Practice by Brian Goetz et al. (OK there's probably other good ways to learn about it, but this is a great book co written by Joshua Bloch, Doug Lea and others)

Double-Checked Locking - does it work in Java on earth?

all:
Here is the famous article:
The "Double-Checked Locking is Broken" Declaration
It declares that pattern doesn't work in Java. It further says, close to the end, that new JVM can make the pattern work by using volatile.
However, in another article: Memory Barriers and JVM Concurrency
It says keyword "synchronized" generates memory barrier full fences. So who is right? Does the pattern work in Java on earth?

There are essentially 3 ways to fix double-checked locking:
ensure that the variable is declared volatile (works from Java 5 onwards);
just don't bother with it in the first place: just use synchronization and don't try to mess around with fancy bug-prone-- and probably pointless-- means of "avoiding" it;
let the classloader do the synchronization for you.
I've posted example code here.
BUT: Double-checked locking is really an outdated paradigm, if indeed it was ever useful in Java. As I see things, it was essentially carried over into Java by C programmers who didn't fully appreciate that the JVM effectively has a more efficient (and correct!) way of dealing with the issue built into the classloader and that optimisations to synchronization are generally best made at the JVM level.
I've seen a lot of people clutter their code with this "pattern". I don't think I've ever seen any actual data showing that it has any benefit.
Plus: if you do have a large application that is hitting synchronization issues, then one of the whole raisons d'être of Java is that it has rich concurrency libraries. Look at how you can re-work your application to use them... if profiling data proves it to be necessary.

It depends on what version of java you are using.
This has been fixed in java 5 and forward.
Check http://en.wikipedia.org/wiki/Double-checked_locking#Usage_in_Java

They're both right, and DCL works fine in Java from 5 on.
If you are expecting your program to produce the exact same output every time given the exact same input, and you are using DCL, you may want to seriously rethink what you are doing. An awful lot can depend on who gets to the lock first--you're rolling a lot of dice. Not good for an accounting app.
If your program involves balls bouncing off walls and each other, DCL may make a lot of sense. It does work. Synchronizing has to be a bit slower than non-synchronizing even without contention, so why do it if a simple if can prevent it? And if 100 threads pile up on a synch statement when the needed object already exists, that has to be a lot slower.

The keyword "synchronized" that generates memory barrier full fences does not mean DCL could work properly. Let's take the following code as example:
public static Runnable getInstance()
{
if (null == instance) //1
{
synchronized (Runnable.class)
{
if (null == instance)
{
instance = new Runnable(); //2
}
}
}
return instance;
}
We know that JVM will follow many steps when construct an object. We focus 2 important steps here:
First, JVM malloc the memory for this object. The value of member-variables in this object has defaut value for now. Second, JVM calls method and assigns the user-specified value to the member variables.
That means thread A may get a partitially-constructed instance in code 1 (in the middle of the code 1 and code 2) . Although "synchronized" generates memory barrier full fences, there is no happen-before guarantee in code 1 and code 2. Memory barrier fences take effect during synchronized code block. Code 1 is outside the synchronized code block.

Java performance issue

I've got a question related to java performance and method execution.
In my app there are a lot of place where I have to validate some parameter, so I've written a Validator class and put all the validation methods into it. Here is an example:
public class NumberValidator {
public static short shortValidator(String s) throws ValidationException{
try{
short sh = Short.parseShort(s);
if(sh < 1){
throw new ValidationException();
}
return sh;
}catch (Exception e) {
throw new ValidationException("The parameter is wrong!");
}
}
...
But I'm thinking about that. Is this OK? It's OO and modularized, but - considering performance - is it a good idea?
What if I had awful lot of invocation at the same time? The snippet above is short and fast, but there are some methods that take more time.
What happens when there are a lot of calling to a static method or an instance method in the same class and the method is not synchronized? All the calling methods have to fall in line and the JVM executes them sequentially?
Is it a good idea to have some class that are identical to the above-mentioned and randomly call their identical methods? I think it is not, because "Don't repeat yourself " and "Duplication is Evil" etc. But what about performance?
Thanks is advance.

On reentrancy of your method: if it's static, it doesn't hold any state, so it's perfectly safe.
On performance: look at your use cases. Since you're validating Strings, I can only assume you are validating user input. In any case, the number of simultaneous users of your system is not probable to incur a performance bottleneck.

Just two comments:
1) Factoring out validation into methods may in fact improve performance a little. As far as I know, the JIT compiler is designed to detect frequent method invocations. The validation methods are therefore good candidates for JIT optimization.
2) Try avoiding 'catch(Exception e)'. This is not recommended since you are capturing all kinds of RuntimeException as well. If you have a bug in one of the non-trivial validations, you may throw a ValidationException that hides a bug in the code.

Not sure what are your concern.
Since you mentioned the method not beeing synchronized I suppose you have concurrent invocations from multiple threads.
And since the method is not sychronized any invocation can be executed concurrently without problems.
For sure you won't get any performance improvements by copy and paste this method in the calling classes. May be you would reduce performance because your code size will increase and will waste space in che processor cache, but for such a short method I think it's a trascurable effect.

Are you experiencing performance problems?
Make the code easy to maintain first, and if it's not living up to expectations, it'll be easy to read and easy to refactor.
If you make the code optimized for speed on every choice, you'll often wind up with something unreadable, and have to start from scratch to fix simple bugs.
That said, your method is a static, and will only need to be initialized once. That is the fast version. :-)

I'm suspicious of this code, not so much on performance grounds as that I don't think it is successfully abstracting out anything that deserves abstraction.
If it is for checking user input, it replaces reasonable error messages like 'maximum number of widgets allowed is 9999' with 'ValidationException'. And if you add something like arguments (or try/catch clauses) to get the messages right in context, then almost certainly the required call-site code is more complex, harder to write and maintain, than the straightforward way of doing things.
If it is for internal sanity checking, you may well start to lose meaningful performance, and certainly greatly increase complexity and bugginess, if you are passing arguments round as strings all over the place and continually parsing and validating them.

Why do all Java Objects have wait() and notify() and does this cause a performance hit?

Every Java Object has the methods wait() and notify() (and additional variants). I have never used these and I suspect many others haven't. Why are these so fundamental that every object has to have them and is there a performance hit in having them (presumably some state is stored in them)?
EDIT to emphasize the question. If I have a List<Double> with 100,000 elements then every Double has these methods as it is extended from Object. But it seems unlikely that all of these have to know about the threads that manage the List.
EDIT excellent and useful answers. #Jon has a very good blog post which crystallised my gut feelings. I also agree completely with #Bob_Cross that you should show a performance problem before worrying about it. (Also as the nth law of successful languages if it had been a performance hit then Sun or someone would have fixed it).

Well, it does mean that every object has to potentially have a monitor associated with it. The same monitor is used for synchronized. If you agree with the decision to be able to synchronize on any object, then wait() and notify() don't add any more per-object state. The JVM may allocate the actual monitor lazily (I know .NET does) but there has to be some storage space available to say which monitor is associated with the object. Admittedly it's possible that this is a very small amount (e.g. 3 bytes) which wouldn't actually save any memory anyway due to padding of the rest of the object overhead - you'd have to look at how each individual JVM handled memory to say for sure.
Note that just having extra methods doesn't affect performance (other than very slightly due to the code obvious being present somewhere). It's not like each object or even each type has its own copy of the code for wait() and notify(). Depending on how the vtables work, each type may end up with an extra vtable entry for each inherited method - but that's still only on a per type basis, not a per object basis. That's basically going to get lost in the noise compared with the bulk of the storage which is for the actual objects themselves.
Personally, I feel that both .NET and Java made a mistake by associating a monitor with every object - I'd rather have explicit synchronization objects instead. I wrote a bit more on this in a blog post about redesigning java.lang.Object/System.Object.

Why are these so fundamental that
every object has to have them and is
there a performance hit in having them
(presumably some state is stored in
them)?
tl;dr: They are thread-safety methods and they have small costs relative to their value.
The fundamental realities that these methods support are that:
Java is always multi-threaded. Example: check out the list of Threads used by a process using jconsole or jvisualvm some time.
Correctness is more important than "performance." When I was grading projects (many years ago), I used to have to explain "getting to the wrong answer really fast is still wrong."
Fundamentally, these methods provide some of the hooks to manage per-Object monitors used in synchronization. Specifically, if I have synchronized(objectWithMonitor) in a particular method, I can use objectWithMonitor.wait() to yield that monitor (e.g., if I need another method to complete a computation before I can proceed). In that case, that will allow one other method that was blocked waiting for that monitor to proceed.
On the other hand, I can use objectWithMonitor.notifyAll() to let Threads that are waiting for the monitor know that I am going to be relinquishing the monitor soon. They can't actually proceed until I leave the synchronized block, though.
With respect to specific examples (e.g., long Lists of Doubles) where you might worry that there's a performance or memory hit on the monitoring mechanism, here are some points that you should likely consider:
First, prove it. If you think there is a major impact from a core Java mechanism such as multi-threaded correctness, there's an excellent chance that your intuition is false. Measure the impact first. If it's serious and you know that you'll never need to synchronize on an individual Double, consider using doubles instead.
If you aren't certain that you, your co-worker, a future maintenance coder (who might be yourself a year later), etc., will never ever ever need a fine granularity of theaded access to your data, there's an excellent chance that taking these monitors away would only make your code less flexible and maintainable.
Follow-up in response to the question on per-Object vs. explicit monitor objects:
Short answer: #JonSkeet: yes, removing the monitors would create problems: it would create friction. Keeping those monitors in Object reminds us that this is always a multithreaded system.
The built-in object monitors are not sophisticated but they are: easy to explain; work in a predictable fashion; and are clear in their purpose. synchronized(this) is a clear statement of intent. If we force novice coders to use the concurrency package exclusively, we introduce friction. What's in that package? What's a semaphore? Fork-join?
A novice coder can use the Object monitors to write decent model-view-controller code. synchronized, wait and notifyAll can be used to implement naive (in the sense of simple, accessible but perhaps not bleeding-edge performance) thread-safety. The canonical example would be one of these Doubles (posited by the OP) which can have one Thread set a value while the AWT thread gets the value to put it on a JLabel. In that case, there is no good reason to create an explicit additional Object just to have an external monitor.
At a slightly higher level of complexity, these same methods are useful as an external monitoring method. In the example above, I explicitly did that (see objectWithMonitor fragments above). Again, these methods are really handy for putting together relatively simple thread safety.
If you would like to be even more sophisticated, I think you should seriously think about reading Java Concurrency In Practice (if you haven't already). Read and write locks are very powerful without adding too much additional complexity.
Punchline: Using basic synchronization methods, you can exploit a large portion of the performance enabled by modern multi-core processors with thread-safety and without a lot of overhead.

All objects in Java have monitors associated with them. Synchronization primitives are useful in pretty much all multi-threaded code, and its semantically very nice to synchronize on the object(s) you are accessing rather than on separate "Monitor" objects.
Java may allocate the Monitors associated with the objects as needed - as .NET does - and in any case the actual overhead for simply allocating (but not using) the lock would be quite small.
In short: its really convenient to store Objects with their thread safety support bits, and there is very little performance impact.

These methods are around to implement inter-thread communication.
Check this article on the subject.
Rules for those methods, taken from that article:
wait( ) tells the calling thread to give up the monitor and go to sleep until some other
thread enters the same monitor and calls notify( ).
notify( ) wakes up the first thread that called wait( ) on the same object.
notifyAll( ) wakes up all the threads that called wait( ) on the same object. The
highest priority thread will run first.
Hope this helps...

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.