Threadsafe publishing of java object structure? - java

Assuming that I have the following code:
final Catalog catalog = createCatalog();
for (int i = 0; i< 100; i++{
new Thread(new CatalogWorker(catalog)).start();
}
"Catalog" is an object structure, and the method createCatalog() and the "Catalog" object structure has not been written with concurrency in mind. There are several non-final, non-volatile references within the product catalog, there may even be mutable state (but that's going to have to be handled)
The way I understand the memory model, this code is not thread-safe. Is there any simple way to make it safe ? (The generalized version of this problem is really about single-threaded construction of shared structures that are created before the threads explode into action)

No, there's no simple way to make it safe. Concurrent use of mutable data types is always tricky. In some situations, making each operation on Catalog synchronized (preferably on a privately-held lock) may work, but usually you'll find that a thread actually wants to perform multiple operations without risking any other threads messing around with things.
Just synchronizing every access to variables should be enough to make the Java memory model problem less relevant - you would always see the most recent values, for example - but the bigger problem itself is still significant.
Any immutable state in Catalog should be fine already: there's a "happens-before" between the construction of the Catalog and the new thread being started. From section 17.4.5 of the spec:
A call to start() on a thread
happens-before any actions in the
started thread.
(And the construction finishing happens before the call to start(), so the construction happens before any actions in the started thread.)

You need to synchronize every method that changes the state of Catalog to make it thread-safe.
public synchronized <return type> method(<parameter list>){
...
}

Assuming you handle the "non-final, non-volatile references [and] mutable state" (presumably by not actually mutating anything while these threads are running) then I believe this is thread-safe. From the JSR-133 FAQ:
When one action happens before
another, the first is guaranteed to be
ordered before and visible to the
second. The rules of this ordering are
as follows:
Each action in a thread happens before every action in that thread
that comes later in the program's
order.
An unlock on a monitor happens before every subsequent lock on that
same monitor.
A write to a volatile field happens before every subsequent read
of that same volatile.
A call to start() on a thread happens before any actions in the
started thread.
All actions in a thread happen before any other thread successfully
returns from a join() on that thread.
Since the threads are started after the call to createCatalog, the result of createCatalog should be visible to those threads without any problems. It's only changes to the Catalog objects that occur after start() is called on the thread that would cause trouble.

Related

Non-volatile fields + first object access from another thread (java)

I have been working on a certain server-type application for a while now, and I found that its design challenges the way I see memory coherence (so to speak) in Java.
This application uses NIO, therefore there is a limited amount of I/O threads (they only do network I/O and nothing else; they never terminate, but may get blocked waiting for more work).
Each connection is internally represented as an object of a specific type, let's call it ClientCon for the sake of this example. ClientCon has various session related fields, none of which are volatile. There is no synchronization of any kind in relation to getting/setting values for these fields.
Received data is made up of logical units with a fixed maximum size. Each such unit has some metadata that allows the handling type (class) to be decided. Once that is done, a new object of that type is created. All such handlers have fields, none of which are volatile. An I/O thread (a concrete I/O thread is assigned to each ClientCon) then calls a protected read method with remaining buffer contents (after metadata was read) on the new handler object.
After this, the same handler object is put into a special queue, which (the queue) is then submitted to a thread pool for execution (where each handler's run method is called to take actions based on the read data). For the sake of this example, we can say that TP threads never terminate.
Therefore, a TP thread will get its hands on an object it never had access to before. All fields of that object are non-volatile (and most/all are non-final, as they were modified outside the constructor).
The handler's run method may act based on session-specific fields in ClientCon as well as set them and/or act on handler object's own fields, whose values were set in the read method.
According to CPJ (Concurrent Programming in Java: Design and Principles):
The first time a thread accesses a field of an object, it sees either the initial value of the field or a value since written by some other thread.
A more comprehensive example of this quote can be found in JLS 17.5:
class FinalFieldExample {
final int x;
int y;
static FinalFieldExample f;
public FinalFieldExample() {
x = 3;
y = 4;
}
static void writer() {
f = new FinalFieldExample();
}
static void reader() {
if (f != null) {
int i = f.x; // guaranteed to see 3
int j = f.y; // could see 0
}
}
}
The class FinalFieldExample has a final int field x and a non-final
int field y. One thread might execute the method writer and another
might execute the method reader.
Because the writer method writes f after the object's constructor
finishes, the reader method will be guaranteed to see the properly
initialized value for f.x: it will read the value 3. However, f.y is
not final; the reader method is therefore not guaranteed to see the
value 4 for it.
This application has been running on x86 (and x86/64) Windows/Unix OSes (Linux flavors, Solaris) for years now (both Sun/Oracle and OpenJDK JVMs, versions 1.5 to 8) and apparently there have been no memory coherency issues related to received data handling. Why?
To sum it up, is there a way for a TP thread to see the object as it was initialized after construction and be unable to see all or some changes done by an I/O thread when it called the protected read method? If so, it would be nice if a detailed example could be presented.
Otherwise, are there some side-effects that could cause the object's field values to always be visible in other threads (e.g. I/O thread acquiring a monitor when adding the handler object to a queue)? Neither the I/O thread nor the TP thread synchronizes on the handler object itself. The queue does no such thing as well (not that it would make sense, anyway). Is this related to a concrete JVM's implementation details, perhaps?
EDIT:
It follows from the above definitions that:
An unlock on a monitor happens-before every subsequent lock on that
monitor. – Not applicable: monitor is not acquired on the handler object
A write to a volatile field (§8.3.1.4) happens-before every subsequent
read of that field. – Not applicable: no volatile fields
A call to start() on a thread happens-before any actions in the
started thread. – A TP thread might already exist when the queue with handler object(s) is submitted for execution. A new handler object might be added to queue amidst an execution on an existing TP thread.
All actions in a thread happen-before any other thread successfully
returns from a join() on that thread. – Not applicable: threads do not wait for each other
The default initialization of any object happens-before any other
actions (other than default-writes) of a program. – Not applicable: field writes are after default init AND after constructor finishes
When a program contains two conflicting accesses (§17.4.1) that are
not ordered by a happens-before relationship, it is said to contain a
data race.
and
Memory that can be shared between threads is called shared memory or
heap memory.
All instance fields, static fields, and array elements are stored in
heap memory. In this chapter, we use the term variable to refer to
both fields and array elements.
Local variables (§14.4), formal method parameters (§8.4.1), and
exception handler parameters (§14.20) are never shared between threads
and are unaffected by the memory model.
Two accesses to (reads of or writes to) the same variable are said to
be conflicting if at least one of the accesses is a write.
There was a write without forcing a HB relationship on field(s), and later there is a read, once again, not forcing a HB relationship on those field(s). Or am I horribly wrong here? That is, there is no declaration that anything about the object could have changed, so why would the JVM force-flush possibly cached values for these fields?
TL;DR
Thread #1 writes values to a new object's fields in a way that does not allow JVM to know that those values should be propagated to other threads.
Thread #2 acquires the object that was modified after construction by Thread #1 and reads those field values.
Why does the issue described in FinalFieldExample/JLS 17.5 NEVER happen in practice?
Why does Thread #2 never see only a default-initialized object (or, alternatively, the object as it was after construction, but before/in the mid of field value changes by Thread #1)?
I'm quite sure that when a thread pool starts a thread / runs a callable, it has a hapens-before semantics, so all changes before the happens-before are available to the thread.
The scenario you mentioned in CPJ is valid when you have more than one thread modifying data concurrently on the same object instance (e.g. 2 threads already running and modifying the same value (or values that happen to be next to each other in the heap).
It looks like in your case, there is no concurrent modification/read of the fields.
It might depend on what type of thread pool you are using. If it's an ExecutorService, then that class makes some strong guarantees about its task. From the documentation:
Memory consistency effects: Actions in a thread prior to the submission of a Runnable or Callable task to an ExecutorService happen-before any actions taken by that task, which in turn happen-before the result is retrieved via Future.get().
So when you initialize any object, plus any other objects, then submit that object to an ExecutorService, all those writes are made visible to the thread that will eventually handle your task.
Now, if you home-rolled your own thread pool, or you're using a thread pool with out these guarantees, then all bets are off. I'd say switch to something that has the guarantee though.
In practice one reason you will never see a violation here is "most are non-final" which means that there is at least one final field. The way HotSpot implements the guarantees given by the JLS when final fields are involved is to put a memory barrier at the end of the constructor, thereby granting the non-final fields the same visibility guarantees.
In theory now this is obviously not necessary, which means it depends on how you queue additional work in your thread pool. Generally I cannot imagine any design where there is not some synchronization going on when queueing the work, before it is executed - not only would it make working with this incredibly awkward (for the same reason that starting a thread invokes happens-before behavior), the way to implement such data structures also necessitates some synchronization.
Java's ThreadPoolExecutor.execute() for example does use a BlockingQueue internally which already gives you all the visibility and ordering guarantees you'd need.

What are the not thread-Safe cases when using HashMap in Java?

In the API documents, we can see:
If multiple threads access a hash map concurrently, and at least one of the threads modifies the map structurally, it must be
synchronized externally. (A structural modification is any operation
that adds or deletes one or more mappings; merely changing the value
associated with a key that an instance already contains is not a
structural modification.)
I'm thinking if the "put" method should be synchronized ? It said only the structural modification. Can you give some unsafe cases for the HashMap. And when I view the source code of "HashTable", the "get" method is also been synchronized, why not only the write operations be synchronized?
There is a general rule of thumb:
If you have more than one thread accessing a collection and at least one thread modifies the collection at some point, you need to synchronize all accesses to the collection.
If you think about it, its very clear: If a collection is modified while another thread reads from it (e.g. iterates), read and write operation can interfere with each other (the read seeing a partial write, e.g. entry created but value not yet set or entry not properly linked yet).
Exempt from this are collections one thread creates and modifies, then hands of to "the world" but never modifies them after publishing their reference.
why not only the write operations be synchronized?
If the reads are not synchronized as well, you might encounter visibility issues. Not only that, but it is also possible to completely thrash the object, if it performs structural changes!
The JVM specification gives a few guarantees regarding when modifications to a memory location made by one thread will be visible to other threads. One such guarantee is that modifications by a thread prior to releasing a lock are visible to threads that subsequently acquire the same lock. That's why you need to synchronized the read operations as well, even in the absence of concurrent structural modifications to the object.
Note that this releasing/acquiring locks is not the only way to guarantee visibility of memory modifications, but it's the easiest. Others include order of starting threads, class initialization, reads/writes to memory locations... more sophisticated stuff (and possibly more scalable on a highly concurrent environment, due to a reduced level of contention).
If you don't use any of those other techniques to ensure visibility, then simply locking only on write operations is wrong code. You might or might not encounter visibility issues though -- there's no guarantee that the JVM will fail, but it's possible, so... wrong code.
I'd suggest you read the book "Java Concurrency in Practice", one of the best texts on the subject I've ever read, after the JVM spec itself. Obviously, the book is way easier (still far from trivial!) and more fun to read than the spec...
One example would be:
Thread 1:
Iterator<YourType> it = yourMapInstance.entrySet().iterator();
while(it.hasNext()) {
it.next().getValue().doSth();
Thread.sleep(1000);
}
}
Thread 2:
for(int i = 0; i < 10; i++) {
if(Math.random() < 0.5) {
yourMapInstance.clear();
Thread.sleep(500);
}
}
Now, if both threads are executed concurrently, at some point there might be a situation, that you have a value in your iterator, while the other thread has already deleted everything from the map. In this case, synchronization is required.

Implementing a Mutex in Java

I have a multi-threaded application (a web app in Tomcat to be exact). In it there is a class that almost every thread will have its own instance of. In that class there is a section of code in one method that only ONE thread (user) can execute at a time. My research has led me to believe that what I need here is a mutex (which is a semaphore with a count of 1, it would seem).
So, after a bit more research, I think what I should do is the following. Of importance is to note that my lock Object is static.
Am I doing it correctly?
public Class MyClass {
private static Object lock = new Object();
public void myMethod() {
// Stuff that multiple threads can execute simultaneously.
synchronized(MyClass.lock) {
// Stuff that only one thread may execute at a time.
}
}
}
In your code, myMethod may be executed in any thread, but only in one at a time. That means that there can never be two threads executing this method at the same time. I think that's what you want - so: Yes.
Typically, the multithreading problem comes from mutability - where two or more threads are accessing the same data structure and one or more of them modifies it.
The first instinct is to control the access order using locking, as you've suggested - however you can quickly run into lock contention where your application looses a lot of processing time to context switching as your threads are parked on lock monitors.
You can get rid of most of the problem by moving to immutable data structures - so you return a new object from the setters, rather than modifying the existing one, as well as utilising concurrent collections, such a ConcurrentHashMap / CopyOnWriteArrayList.
Concurrent programming is something you'll need to get your head around, especially as throughput comes from parallelisation in todays modern computing world.
This will allow one thread at a time through the block. Other thread will wait, but no queue as such, there is no guarantee that threads will get the lock in a fair manner. In fact with Biased lock, its unlikely to be fair. ;)
Your lock should be final If there is any reason it can't its probably a bug. BTW: You might be able to use synchronized(MyClass.class) instead.

Java Thread Synchronization, best concurrent utility, read operation

I have a java threads related question.
To take a very simple example, lets say I have 2 threads.
Thread A running StockReader Class instance
Thread B running StockAvgDataCollector Class instance
In Thread B, StockAvgDataCollector collects some market Data continuously, does some heavy averaging/manipulation and updates a member variable spAvgData
In Thread A StockReader has access to StockAvgDataCollector instance and its member spAvgData using getspAvgData() method.
So Thread A does READ operation only and Thread B does READ/WRITE operations.
Questions
Now, do I need synchronization or atomic functionality or locking or any concurrency related stuff in this scenario? It doesnt matter if Thread A reads an older value.
Since Thread A is only going READ and not update anything and only Thread B does any WRITE operations, will there be any deadlock scenarios?
I've pasted a paragraph below from the following link. From that paragraph, it seems like I do need to worry about some sort of locking/synchronizing.
http://java.sun.com/developer/technicalArticles/J2SE/concurrency/
Reader/Writer Locks
When using a thread to read data from an object, you do not necessarily need to prevent another thread from reading data at the same time. So long as the threads are only reading and not changing data, there is no reason why they cannot read in parallel. The J2SE 5.0 java.util.concurrent.locks package provides classes that implement this type of locking. The ReadWriteLock interface maintains a pair of associated locks, one for read-only and one for writing. The readLock() may be held simultaneously by multiple reader threads, so long as there are no writers. The writeLock() is exclusive. While in theory, it is clear that the use of reader/writer locks to increase concurrency leads to performance improvements over using a mutual exclusion lock. However, this performance improvement will only be fully realized on a multi-processor and the frequency that the data is read compared to being modified as well as the duration of the read and write operations.
Which concurrent utility would be less expensive and suitable in my example?
java.util.concurrent.atomic ?
java.util.concurrent.locks ?
java.util.concurrent.ConcurrentLinkedQueue ? - In this case StockAvgDataCollector will add and StockReader will remove. No getspAvgData() method will be exposed.
Thanks
Amit
Well, the whole ReadWriteLock thing really makes sense when you have many readers and at least one writer... So you guarantee liveliness (you won't be blocking any reader threads if no one other thread is writing). However, you have only two threads.
If you don't mind thread B reading an old (but not corrupted) value of spAvgData, then I would go for an AtomicDouble (or AtomicReference, depending on what spAvgData's datatype).
So the code would look like this
public class A extends Thread {
// spAvgData
private final AtomicDouble spAvgData = new AtomicDouble(someDefaultValue);
public void run() {
while (compute) {
// do intensive work
// ...
// done with work, update spAvgData
spAvgData.set(resultOfComputation);
}
}
public double getSpAvgData() {
return spAvgData.get();
}
}
// --------------
public class B {
public void someMethod() {
A a = new A();
// after A being created, spAvgData contains a valid value (at least the default)
a.start();
while(read) {
// loll around
a.getSpAvgData();
}
}
}
Yes, synchronization is important and you need to consider two parameters: visibility of the spAvgData variable and atomicity of its update. In order to guarantee visibility of the spAvgData variable in thread B by thread A, the variable can be declared volatile or as an AtomicReference. Also you need to guard that the action of the update is atomic in case there are more invariants involved or the update action is a compound action, using synchronization and locking. If only thread B is updating that variable then you don't need synchronization and visibility should be enough for thread A to read the most up-to-date value of the variable.
If you don't mind that Thread A can read complete nonsense (including partially updated data) then no, you don't need any synchronisation. However, I suspect that you should mind.
If you just use a single mutex, or ReentrantReadWriteLock and don't suspend or sleep without timeout while holding locks then there will be no deadlock. If you do perform unsafe thread operations, or try to roll your own synchronisation solution, then you will need to worry about it.
If you use a blocking queue then you will also need a constantly-running ingestion loop in StockReader. ReadWriteLock is still of benefit on a single core processor - the issues are the same whether the threads are physically running at the same time, or just interleaved by context switches.
If you don't use at least some form of synchronisation (e.g. a volatile) then your reader may never see any change at all.

Do I need extra synchronization when using a BlockingQueue?

I have a simple bean #Entity Message.java that has some normal properties. The life-cycle of that object is as follows
Instantiation of Message happens on Thread A, which is then enqueued into a blockingQueue
Another thread from a pool obtains that object and do some stuff with it and changes the state of Message, after that, the object enters again into the blockingQueue. This step is repeated until a condition makes it stop. Each time the object gets read/write is potentially from a different thread, but with the guarantee that only one thread at a time will be reading/writing to it.
Given that circumstances, do I need to synchronize the getters/setters ? Perhaps make the properties volatile ? or can I just leave without synchronization ?
Thanks and hope I could clarify what I am having here.
No, you do not need to synchronize access to the object properties, or even use volatile on the member variables.
All actions performed by a thread before it queues an object on a BlockingQueue "happen-before" the object is dequeued. That means that any changes made by the first thread are visible to the second. This is common behavior for concurrent collections. See the last paragraph of the BlockingQueue class documentation:
Memory consistency effects: As with other concurrent collections, actions in a thread prior to placing an object into a BlockingQueue happen-before actions subsequent to the access or removal of that element from the BlockingQueue in another thread.
As long as the first thread doesn't make any modifications after queueing the object, it will be safe.
You don't need to do synchronization yourself, because the queue does it for you already.
Visibility is also guaranteed.
If you're sure that only one thread at a time will access your object, then you don't need synchronisation.
However, you can ensure that by using the synchronized keyword: each time you want to access this object and be sure that no other thread is using the same instance, wrap you code in a synchronized block:
Message myMessage = // ...
synchronized (myMessage) {
// You're the only one to have access to this instance, do what you want
}
The synchronized block will acquire an implicit lock on the myMessage object. So, no other synchronized block will have access to the same instance until you leave this block.
It would sound like you could leave of the synchronized off the methods. The synchronized simply locks the object to allow only a single thread to access it. You've already handled that with the blocking queue.
Volatile would be good to use, as that would ensure that each thread has the latest version, instead of a thread local cache value.

Categories

Resources