JVM, Java, Multithreading, Object Creation

JVM, Java, Multithreading, Object Creation - java

i am encountering a weird scenario,
Is there a possibility of JVM re-using an already created object when we are initializing a new one and the object count is JVm is very high?
abc a = new abc();
a.setAttribute("aaaa");
.........
a...is no longer being used...and has not yet been garbage collected by the JVM. There are multiple threads creating 5000 instances of class abc..
again, abc a = new abc();
Sysout(a.getAttribute()); // This prints "aaaa" set for an earlier instance!
Is there a possibility of an instance being re-used.? Has anyone come across this scenario before?

No. I think this is a bug of yours. Maybe also try with a different JVM version or vendor to see whether those behave as you expect or not.

This would constitute a bug in the JVM, but I would consider it very unlikely.
I would say with 99% confidence that your code is simply exhibiting a race condition, such as a thread other than the one you are observing setting the attribute.

The JVM does not re-une objects AFAIK. But the behaviour you are seeing can be explained.
a.setAttribute("aaaa"); and a.getAttribute might be setting a static field, a singleton or a threadlocal in another class, or another thread is changing state.

Objects will not be 'reused'. You can check the following -
Do you get an OutOfMemoryError? If yes, the program can be in an inconsistent state
Are you sure that other threads are not modifying your 'a' object?
Note: Updated answer after gid corrected me.

Depending on where these assignments take place your program may be exhibiting statement reordering: the JVM may be instruction-reordering the assignment statements so they don't execute in the order you coded them. This is part of the Memory Model specification and could point to your program being under synchronized.
See the JSR133 FAQ:
http://www.cs.umd.edu/users/pugh/java/memoryModel/jsr-133-faq.html#reordering
Or section 2 in:
http://www.cs.umd.edu/~pugh/java/memoryModel/jsr133.pdf
An easier explanation begins at 10:40 in this video:
http://www.youtube.com/watch?v=1FX4zco0ziY

if you are using multithreading you might be encountering what is know as 'stale data'
some of it explained in Java multi-threading & Safe Publication

Related

Is there an equivalent of Java 'volatile' in C++?

In Java, sometimes when accessing the same variable from different threads, each thread will create its own copy of the variable, and so if I set the value of the variable in one thread to 10 and then I tried to read the value of this variable from another thread, I will not get 10 (because the second thread is reading from another copy of the variable!).
To fix this problem in Java, all I had to do is to use the keyword volatile, for example:
volatile int i = 123;
Does this problem also exists in C++? If so, how can I fix it?
Note: I am using Visual C++ 2010.

Yes, the same problem exists in C++. But since C already introduce the keyword volatile with a different meaning (not related to threads), and C++ used they keyword in the same way, you can't use volatile in C++ like you can in Java.
Instead, you're probably better off using std::atomic<T> (or boost::). It's not always the most efficient alternative, but it's simple. If this turns out to be a bottleneck, you can relax the std::memory_order used by std::atomic.
Having said that about standard C++, MSVC++ as an extension does guarantee that multiple threads can access a shared volatile variable. IIRC, all threads will eventually see the same value, and no threads will travel back in time. (That is to say, if 0 and 1 are written to a variable sequentially, no thread will ever see the sequence 1,0)

java, when (and for how long) can a thread cache the value of a non-volatile variable?

From this post: http://www.javamex.com/tutorials/synchronization_volatile_typical_use.shtml
public class StoppableTask extends Thread {
private volatile boolean pleaseStop;
public void run() {
while (!pleaseStop) {
// do some stuff...
}
}
public void tellMeToStop() {
pleaseStop = true;
}
}
If the variable were not declared volatile (and without other
synchronization), then it would be legal for the thread running the
loop to cache the value of the variable at the start of the loop and
never read it again.
In Java 5 or later:
is the last paragraph correct?
So, exactly at what moment can a thread cache the value of the pleaseStop variable (and for how long)? just before calling one of StoppableTask's functions (run, tellMeTpStop) of the object? (and the thread must update the variable when exiting the function at the latest?)
can you point me to a documentation/tutorial reference about this (Java 5 or later)?
Update: here it is my compilation of answers posted on this question:
Without using volatile nor synchronized, there are actually two problems with the above program:
1- Threads can cache the variable pleaseStop since the very first moment that the thread starts and don't update it never again. so, the loop would keep going forever. this can be solved by either using volatile or synchronized. This thread cache mechanism does not exist in C.
2- The java compiler can optimise the code, and replace while(!pleaseStop) {...} to if (!pleaseStop) { while (true) {...}}. so, the loop would keep going forever. again, this can be solved by either using volatile or synchronized. This compiler optimisation exists also in C.
Some more info:
https://www.ibm.com/developerworks/library/j-5things15/

When can it cache?
As for your question about "when can it cache" the value, the answer to that is "always". To understand what that means, read on. Processors have storage called caches, which make it possible for the running thread to access values in memory by reading from the cache rather than from memory. The running thread can also write to this cache as if it were writing the value to memory. Thus, so long as the thread is running, it could be using the cache to store the data it's using. Something has to explicitly happen to flush the value from the cache to memory. For a single-threaded process, this is all well and dandy, but if you have another thread, it might be trying to read the data from memory while the other thread is plugging away reading and writing it to the processor cache without flushing to memory.
How long can it cache?
As for the "for how long" part- the answer is unfortunately forever unless you do something about it. Synchronizing on the data in question is one way to force a flush from the cache so that all threads see the updates to the value. For more detail about ways to cause a flush, see the next section.
Where's some Documentation?
As for the "where's the documentation" question, a good place to start is here. For specifically how you can force a flush, java refers to this by discussing whether one action (such as a data write) appears to "happen before" another (like a data read). For more about this, see here.
What about volatile?
volatile in essence prevents the type of processor caching described above. This ensures that all writes to a variable are visible from other threads. To learn more, the tutorial you linked to in your post seems like a good start.

The relevant documentation is on the volatile keyword (Java Language Specification, Chapter 8.3.1.4) here and the Java memory model (Java Language Specification, Chapter 17.4) here
Declaring the parameter volatile ensures that there is some synchronization of actions by other threads that might change its value. Without declaring volatile, Java can reorder operations taken on a parameter by different threads.
As the Spec says (see 8.3.1.4), for parameters declared volatile,"accesses ... occur exactly as many times, and in exactly the same order, as they appear to occur during execution of the program text by each thread..."
So the caching you speak of can happen anytime if the parameter is not volatile. But there is enforcement of consistent access to that parameter by the Java memory model if the parameter is declared volatile. But no such enforcement would take place if not (unless the threads are synchronized).

The official documentation is in section 17 of the Java Language Specification, especially 17.4 Memory Model.
The correct viewpoint is to start by assuming multi-threaded code won't work, and try to force it to work whether it likes it or not. Without the volatile declaration, or similar, there would be nothing forcing the read of pleaseStop to ever see the write if it happens in another thread.
I agree with the Java Concurrency in Practice recommendation. It is a good distillation of the implications of the JLS material for practical Java programming.

Why is this multithreading code broken?

Why is the following multithreading related example code broken?
public void method1(){
synchronized(intVariable){
}
synchronized(stringVariable){
}
}
public void method2(){
synchronized(stringVariable){
}
synchronized(intVariable){
}
}
Above two methods are from same class where stringVariable and intVariable are instance variables.
I thought it will not cause any problem, at least with Thread deadlocks. Is there any other reason why this code is broken?

Either you didn't understand the problem, or you are right that this wouldn't cause a deadlock.
Perhaps he was looking for something more obscure like,
you can't lock an int field.
locking a String object is a very bad idea because you don't know how it is shared.
But I doubt it. In any case, he should have clarified the question and your answer because perhaps he might have learnt something, if only how to make the question clearer next time.
If you, as an interviewer, have a set of screening questions, you should make sure they are covered before you even bring in a candidate. A questionnaire to give to HR or an agent can be useful. A phone interview is often a good first set. As a candidate, I sometimes ask for a phone interview, just to see if it is worth my time going to a face to face. (e.g. if I have serious doubts its worth it)
Not only are you trying to convince them you are a good fit for them, but they are trying to convince you they are a good fit for you. It appears they failed both technically to explain the problem to you, and how they handled it HR wise, so I would count yourself lucky you didn't waste any more time with them.
BTW: Most big companies are diverse and working for one team can be very different to another team. It would be unfair to characterise a company based on one experience.

The problem is, assuming that both variables have a reference type (otherwise you couldn’t synchronize on them), that synchronizing on a variable whose contents could change is broken.
The first read of the variable is done without synchronization and whatever reference the thread will see (which could be a completely outdated value) is used to synchronize on, which does not prevent other threads from synchronizing on a different value of that variable as it will be a completely different object.
Since String and Integer are immutable each change of the variable’s value implies changing the reference contained in the variable, allowing another thread to enter the synchronized block while the thread performing the change is still inside that block.
And due to legal reordering of operations it might even appear as if the second thread performs actions inside the synchronized block before the first thread performs the write. Just recall that the read of the reference to use for synchronization is not synchronized. So it’s like having no synchronization at all.

How to make cache thread safe

I have a instance of a object which performs very complex operation.
So in the first case I create an instance and save it it my own custom cache.
From next times whatever thread comes if he finds that a ready made object is already present in the cache they take it from the cache so as to be good in performance wise.
I was worried about what if two threads have the same instance. IS there a chance that the two threads can corrupt each other.
Map<String, SoftReference<CacheEntry<ClassA>>> AInstances= Collections.synchronizedMap(new HashMap<String, SoftReference<CacheEntry<ClassA>>>());

There are many possible solutions:
Use an existing caching solution like EHcache
Use the Spring framework which got an easy way to cache results of a method with a simple #Cacheable annotation
Use one of the synchronized maps like ConcurrentHashMap
If you know all keys in advance, you can use a lazy init code. Note that everything in this code is there for a reason; change anything in get() and it will break eventually (eventually == "your unit tests will work and it will break after running one year in production without any problem whatsoever").
ConcurrentHashMap is most simple to set up but it has simple way to say "initialize the value of a key once".
Don't try to implement the caching by yourself; multithreading in Java has become a very complex area with Java 5 and the advent of multi-core CPUs and memory barriers.
[EDIT] yes, this might happen even though the map is synchronized. Example:
SoftReference<...> value = cache.get( key );
if( value == null ) {
value = computeNewValue( key );
cache.put( key, value );
}
If two threads run this code at the same time, computeNewValue() will be called twice. The method calls get() and put() are safe - several threads can try to put at the same time and nothing bad will happen, but that doesn't protect you from problems which arise when you call several methods in succession and the state of the map must not change between them.

Assuming you are talking about singletons, simply use the "demand on initialization holder idiom" to make sure your "check" works across all JVM's. This will also make sure all threads which are requesting the same object concurrently wait till the initialization is over and be given back only valid object instance.
Here I'm assuming you want a single instance of the object. If not, you might want to post some more code.

Ok If I understand your problem correctly, you are worried that 2 objects changing the state of the shared object will corrupt each other.
The short answer is yes they will.
If the object is expensive in creation but is needed in a read only manner. I suggest you make it immutable, this way you get the benefit of it being fast in access and at the same time thread safe.
If the state should be writable but you don't actually need threads to see each others updates. You can simply load the object once in an immutable cache and just return copies to anyone who asks for the object.
Finally if your object needs to be writable and shared (for other reasons than it just being expensive to create). Then my friend you need to handle thread safety, I don't know your case but you should take a look at the synchronized keyword, Locks and java 5 concurrency features, Atomic types. I am sure one of them will satisfy your need and I sincerely wish that your case is one of the first 2 :)

If you only have a single instance of the Object, have a quick look at:
Thread-safe cache of one object in java
Other wise I can't recommend the google guava library enough, in particular look at the MapMaker class.

Does using only immutable data types make a Java program thread safe?

Is it true that if I only use immutable data type, my Java program would be thread safe?
Any other factors will affect the thread safety?
****Would appreciate if can provide an example. Thanks!**
**

Thread safety is about protecting shared data and immutable objects are protected as they are read only. Well apart from when you create them but creating a object is thread safe.
It's worth saying that designing a large application that ONLY uses immutable objects to achieve thread safety would be difficult.
It's a complicated subject and I would recommend you reading Java Concurrency in Practice
which is a very good place to start.

It is true. The problem is that it's a pretty serious limitation to place on your application to only use immutable data types. You can't have any persistent objects with state which exist across threads.
I don't understand why you'd want to do it, but that doesn't make it any less true.
Details and example: http://www.javapractices.com/topic/TopicAction.do?Id=29

If every single variable is immutable (never changed once assigned) you would indeed have a trivially thread-safe program.
Functional programming environments takes advantage of this.
However, it is pretty difficult to do pure functional programming in a language not designed for it from the ground up.
A trivial example of something you can't do in a pure functional program is use a loop, as you can't increment a counter. You have to use recursive functions instead to achieve the same effect.
If you are just straying into the world of thread safety and concurrency, I'd heartily recommend the book Java Concurrency in Practice, by Goetz. It is written for Java, but actually the issues it talks about are relevant in other languages too, even if the solutions to those issues may be different.

Immutability allows for safety against certain things that can go wrong with multi-threaded cases. Specifically, it means that the properties of an object visible to one thread cannot be changed by another thread while that first thread is using it (since nothing can change it, then clearly another thread can't).
Of course, this only works as far as that object goes. If a mutable reference to the object is also shared, then some cases of cross-thread bugs can happen by something putting a new object there (but not all, since it may not matter if a thread works on an object that has already been replaced, but then again that may be crucial).
In all, immutability should be considered one of the ways that you can ensure thread-safety, but neither the sole way nor necessarily sufficient in itself.

Although immutable objects are a help with thread safety, you may find "local variables" and "synchronize" more practical for real world progamming.

Any program where no mutable aspect of program state is accessed by more than one thread will be trivally thread-safe, as each thread may as well be its own separate program. Useful multi-threading, however, generally requires interaction between threads, which implies the existence of some mutable shared state.
The key to safe and efficient multi-threading is to incorporate mutability at the right "design level". Ideally, each aspect of program state should be representable by one immutably-rooted(*), mutable reference to an object whose observable state is immutable. Only one thread at a time may try to change the state represented by a particular mutable reference. Efficient multi-threading requires that the "mutable layer" in a program's state be low enough that different threads can use different parts of it. For example, if one has an immutable AllCustomers data structure and two threads simultaneously attempted to change different customers, each would generate a version of the AllCustomers data structure which included its own changes, but not that of the other thread. No good. If AllCustomers were a mutable array of CustomerState objects, however, it would be possible for one thread to be working on AllCustomers[4] while another was working on AllCustomers[9], without interference.
(*) The rooted path must exist when the aspect of state becomes relevant, and must not change while the access is relevant. For example, one could design an AddOnlyList<thing> which hold a thing[][] called Arr that was initialized to size 32. When the first thing is added, Arr[0] would be initialized, using CompareExchange, to an array of 16 thing. The next 15 things would go in that array. When the 17th thing is added, Arr[1] would be initialized using CompareExchange to an array of size 32 (which would hold the new item and the 31 items after it). When the 49th thing is added, Arr[2] would be initialized for 64 items. Note that while thing itself and the arrays contained thereby would not be totally immutable, only the very first access to any element would be a write, and once Arr[x][y] holds a reference to something, it would continue to do so as long as Arr exists.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.