Is Compact Language Detector 2's detect method thread safe?

Is Compact Language Detector 2's detect method thread safe? - java

We are using the Java Wrapper implementation of Compact Language Detector 2.
Is the detect() function thread-safe?
From what I understand, it invokes this library function.

No, it is not thread safe if the native code was compiled with CLD2_DYNAMIC_MODE set, which you could test using the function isDataDynamic().
The native function manipulates the static class variable kScoringtables. If CLD2_DYNAMIC_MODE is defined at compilation, this variable is initialized to a set of null tables (NULL_TABLES) and can later be loaded with dynamic data, or unloaded, potentially by other threads.
It would be possible for the kScoringtables.quadgram_obj to be non-null at the line 1762 null check and then the kScoringtables address altered before it is added to the cross-thread ScoringContext object on line 1777. In this case, the wrong pointer would be passed to ApplyHints on line 1785, potentially causing bad things to happen at line 1606.
This would be a very rare race condition, but possible nonetheless, and is not thread safe for the same reason the standard "lazy getter" is not thread safe.
To make this thread-safe, you would have to either test that isDataDynamic() returns false, or ensure the loadDataFromFile, loadDataFromRawAddress, and unloadData functions could not be called by a different thread while you are executing this method (or at least until you are past line 1777...)

Related

Explain how JIT reordering works

I have been reading a lot about synchronization in Java and all the problems that can occur. However, what I'm still slightly confused about is how the JIT can reorder a write.
For instance, a simple double check lock makes sense to me:
class Foo {
private volatile Helper helper = null; // 1
public Helper getHelper() { // 2
if (helper == null) { // 3
synchronized(this) { // 4
if (helper == null) // 5
helper = new Helper(); // 6
}
}
return helper;
}
}
We use volatile on line 1 to enforce a happens-before relationship. Without it, is entirely possible for the JIT to reoder our code. For example:
Thread 1 is at line 6 and memory is allocated to helper however, the constructor has not yet run because the JIT could reordering our code.
Thread 2 comes in at line 2 and gets an object that is not fully created yet.
I understand this, but I don't fully understand the limitations that the JIT has on reordering.
For instance, say I have a method that creates and puts a MyObject into a HashMap<String, MyObject> (I know that a HashMapis not thread safe and should not be used in a multi-thread environment, but bear with me). Thread 1 calls createNewObject:
public class MyObject {
private Double value = null;
public MyObject(Double value) {
this.value = value;
}
}
Map<String, MyObject> map = new HashMap<String, MyObject>();
public void createNewObject(String key, Double val){
map.put(key, new MyObject( val ));
}
At the same time thread 2 calls a get from the Map.
public MyObject getObject(String key){
return map.get(key);
}
Is it possible for thread 2 to receive an object from getObject(String key) that is not fully constructed? Something like:
Thread 1: Allocate memory for new MyObject( val )
Thread 1: Place object in map
Thread 2: call getObject(String key)
Thread 1: Finish constructing the new MyObject.
Or will map.put(key, new MyObject( val )) not put an object into the map until it's fully constructed?
I'd imagine that the answer is, it wouldn't put an object into the Map until it is fully constructed (because that sounds awful). So how can the JIT reorder?
In a nutshell can it only reorder when creating a new Object and assigning it to a reference variable, like the double checked lock? A complete rundown on the JIT might be much for a SO answer, but what I'm really curious about is how it can reorder a write (like line 6 on the double checked lock) and what stops it from putting an object into a Map that is not fully constructed.

WARNING: WALL OF TEXT
The answer to your question is before the horizontal line. I will continue to explain deeper the fundamental problem in the second portion of my answer (which is not related to the JIT, so that's it if you are only interested in the JIT). The answer to the second part of your question lies at the bottom because it relies on what I describe further.
TL;DR The JIT will do whatever it wants, the JMM will do whatever it wants, being valid under the condition that you let them by writing thread unsafe code.
NOTE: "initialization" refers to what happens in the constructor, which excludes anything else such as calling a static init method after constructing etc...
"If the reordering produces results consistent with a legal execution, it is not illegal." (JLS 17.4.5-200)
If the result of a set of actions conforms to a valid chain of execution as per the JMM, then the result is allowed regardless of whether the author intended the code to produce that result or not.
"The memory model describes possible behaviors of a program. An implementation is free to produce any code it likes, as long as all resulting executions of a program produce a result that can be predicted by the memory model.
This provides a great deal of freedom for the implementor to perform a myriad of code transformations, including the reordering of actions and removal of unnecessary synchronization" (JLS 17.4).
The JIT will reorder whatever it sees fit unless we do not allow it using the JMM (in a multithreaded environment).
The details of what the JIT can or will do is nondeterministic. Looking at millions of samples of runs will not produce a meaningful pattern because reorderings are subjective, they depend on very specific details such as CPU arch, timings, heuristics, graph size, JVM vendor, bytecode size, etc... We only know that the JIT will assume that the code runs in a single threaded environment when it does not need to conform to the JMM. In the end, the JIT matters very little to your multithreaded code. If you want to dig deeper, see this SO answer and do a little research on such topics as IR Graphs, the JDK HotSpot source, and compiler articles such as this one. But again, remember that the JIT has very little to do with your multithreaded code transforms.
In practice, the "object that is not fully created yet" is not a side effect of the JIT but rather the memory model (JMM). In summary, the JMM is a specification that puts forth guarantees of what can and cannot be results of a certain set of actions, where actions are operations that involve a shared state. The JMM is more easily understood by higher level concepts such as atomicity, memory visibility, and ordering, those three of which are components of a thread-safe program.
To demonstrate this, it is highly unlikely for your first sample of code (the DCL pattern) to be modified by the JIT that would produce "an object that is not fully created yet." In fact, I believe that it is not possible to do this because it would not follow the order or execution of a single-threaded program.
So what exactly is the problem here?
The problem is that if the actions aren't ordered by a synchronization order, a happens-before order, etc... (described again by JLS 17.4-17.5) then threads are not guaranteed to see the side effects of performing such actions. Threads might not flush their caches to update the field, threads might observe the write out of order. Specific to this example, threads are allowed to see the object in an inconsistent state because it is not properly published. I'm sure that you have heard of safe publishing before if you have ever worked even the tiniest bit with multithreading.
You might ask, well if single-threaded execution cannot be modified by the JIT, why can the multithreaded version be?
Put simply, it's because the thread is allowed to think ("perceive" as usually written in textbooks) that the initialization is out of order due to the lack of proper synchronization.
"If Helper is an immutable object, such that all of the fields of Helper are final, then double-checked locking will work without having to use volatile fields. The idea is that a reference to an immutable object (such as a String or an Integer) should behave in much the same way as an int or float; reading and writing references to immutable objects are atomic" (The "Double-Checked Locking is Broken" Declaration).
Making the object immutable ensures that the state is fully initialized when the constructor exits.
Remember that object construction is always unsynchronized. An object that is being initialized is ONLY visible and safe with respect to the thread that constructed it. In order for other threads to see the initialization, you must publish it safely. Here are those ways:
"There are a few trivial ways to achieve safe publication:
Exchange the reference through a properly locked field (JLS 17.4.5)
Use static initializer to do the initializing stores (JLS 12.4)
Exchange the reference via a volatile field (JLS 17.4.5), or as the consequence of this rule, via the AtomicX classes
Initialize the value into a final field (JLS 17.5)."
(Safe Publication and Safe Initialization in Java)
Safe publication ensures that other threads will be able to see the fully initialized objects when after it finishes.
Revisiting our idea that threads are only guaranteed to see side effects if they are in order, the reason that you need volatile is so that your write to the helper in thread 1 is ordered with respect to the read in thread 2. Thread 2 is not allowed to perceive the initialization after the read because it occurs before the write to helper. It piggy backs on the volatile write such that the read must happen after the initialization AND THEN the write to the volatile field (transitive property).
To conclude, an initialization will only occur after the object is created only because another thread THINKS that is the order. An initialization will never occur after construction due to a JIT optimisation. You can fix this by ensuring proper publication through a volatile field or by making your helper immutable.
Now that I've described the general concepts behind how publication works in the JMM, hopefully understanding how your second example won't work will be easy.
I'd imagine that the answer is, it wouldn't put an object into the Map until it is fully constructed (because that sounds awful). So how can the JIT reorder?
To the constructing thread, it will put it into the map after initialization.
To the reader thread, it can see whatever the hell it wants. (improperly constructed object in HashMap? That is definitely within the realm of possibility).
What you described with your 4 steps is completely legal. There is no order between assigning value or adding it to the map, thus thread 2 can perceive the initialization out of order since MyObject was published unsafely.
You can actually fix this problem by just converting to ConcurrentHashMap and getObject() will be completely thread safe as once you put the object in the map, the initialization will occur before the put and both will need to occur before the get as a result of ConcurrentHashMap being thread safe. However, once you modify the object, it will become a management nightmare because you need to ensure that updating the state is visible and atomic - what if a thread retrieves an object and another thread updates the object before the first thread could finish modifying and putting it back in the map?
T1 -> get() MyObject=30 ------> +1 --------------> put(MyObject=31)
T2 -------> get() MyObject=30 -------> +1 -------> put(MyObject=31)
Alternatively you could also make MyObject immutable, but you still need to map the map ConcurrentHashMap in order for other threads to see the put - thread caching behavior might cache an old copy and not flush and keep reusing the old version. ConcurrentHashMap ensures that its writes are visible to readers and ensures thread-safety. Recalling our 3 prerequisites for thread-safety, we get visibility from using a thread-safe data structure, atomicity by using an immutable object, and finally ordering by piggybacking on ConcurrentHashMap's thread safety.
To wrap up this entire answer, I will say that multithreading is a very difficult profession to master, one that I myself most definitely have not. By understanding concepts of what makes a program thread-safe and thinking about what the JMM allows and guarantees, you can ensure that your code will do what you want it to do. Bugs in multithreaded code occur often as a result of the JMM allowing a counterintuitive result that is within its parameters, not the JIT doing performance optimisations. Hopefully you will have learned something a little bit more about multithreading if you read everything. Thread safety should be achieved by building a repertoire of thread-safe paradigms rather than using little inconveniences of the spec (Lea or Bloch, not even sure who said this).

How to ensure thread safety of utility static method?

Is there any general way or rules exits by which we can ensure the thread safety of static methods specifically used in various Utility classes of any applications. Here I want to specifically point out the thread safety of Web Applications.
It is well know that static methods with Immutable Objects as parameters are thread safe and Mutable Objects are not.
If I have a utility method for some manipulation of java.util.Date and that method accepts an instance of java.util.Date, then this method would not be thread safe. Then how to make it thread safe without changing the way of parameter passing?
public class DateUtils {
public static Date getNormalizeDate(Date date) {
// some operations
}
}
Also is the class javax.faces.context.FacesContext mutable? Is it thread safe to pass an instance of this class to such static utility method?
This list of classes, instances of which can be or cannot be passed as parameters, could be long; so what points should we keep in mind while writing codes of such utility classes?

It is well known that static methods with immutable objects as parameters are thread safe and mutable objects are not.
I would contest this. Arguments passed to a method are stored on a stack, which is a per-thread idiom.
If your parameter is a mutable object such as a Date then you need to ensure other threads are not modifying it at the same time elsewhere. But that's a different matter unrelated to the thread-safety of your method.
The method you posted is thread-safe. It maintains no state and operates only on its arguments.
I would strongly recommend you read Java Concurrency in Practice, or a similar book dedicated to thread safety in Java. It's a complex subject that cannot be addressed appropriately through a few StackOverflow answers.

Since your class does not hold any member variables, your method is stateless (it only uses local variables and the argument) and therefore is thread safe.
The code that calls it might not be thread safe but that's another discussion. For example, Date not being thread safe, if the calling code reads a Date that has been written by another thread, you must use proper synchronization in the Date writing and reading code.

Given the structure of the JVM, local variables, method parameters, and return values are inherently "thread-safe." But instance variables and class variables will only be thread-safe if you design your class appropriately. more here

I see a lot of answers but none really pointing out the reason.
So this can be thought like this,
Whenever a thread is created, it is created with its own stack (I guess the size of the stack at the time of creation is ~2MB). So any execution that happens actually happens within the context of this thread stack.
Any variable that is created lives in the heap but it's reference lives in the stack with the exceptions being static variables which do not live in the thread stack.
Any function call you make is actually pushed onto the thread stack, be it static or non-static. Since the complete method was pushed onto the stack, any variable creation that takes place lives within the stack (again exceptions being static variables) and only accessible to one thread.
So all the methods are thread safe until they change the state of some static variable.

I would recommend creating a copy of that (mutable) object as soon as the method starts and use the copy instead of original parameter.
Something like this
public static Date getNormalizeDate(Date date) {
Date input = new Date(date.getTime());
// ...
}

Here's how I think of it: imagine a CampSite (that's a static method). As a camper, I can bring in a bunch of objects in my rucksack (that's arguments passed in on the stack). The CampSite provides me with a place to put my tent and my campstove, etc, but if the only thing the CampSite does is allow me to modify my own objects then it's threadsafe. The CampSite can even create things out of thin air (FirePit firepit = new FirePit();), which also get created on the stack.
At any time I can disappear with all my objects in my ruckstack and one of any other campers can appear, doing exactly what they were doing the last time they disappeared. Different threads in this CampSite will not have access to objects on the stack created CampSite in other threads.
Say there's only one campStove (a single object of CampStove, not separate instantiations). If by some stretch of the imagination I am sharing a CampStove object then there are multi-threading considerations. I don't want to turn on my campStove, disappear and then reappear after some other camper has turned it off - I would forever be checking if my hot dog was done and it never would be. You would have to put some synchronization somewhere... in the CampStove class, in the method that was calling the CampSite, or in the CampSite itself... but like Duncan Jones says, "that's a different matter".
Note that even if we were camping in separate instantiations of non-static CampSite objects, sharing a campStove would have the same multi-threading considerations.

We will take some examples to see if static method is Thread-Safe or not.
Example 1:
public static String concat (String st1, String str2) {
return str1 + str2
}
Now above method is Thread Safe.
Now we will see another example which is not thread-safe.
Example 2:
public static void concat(StringBuilder result, StringBuilder sb, StringBuilder sb1) {
result.append(sb);
result.append(sb1);
}
If you see both methods are very very primitive but still one is thread safe and other one is not. Why? What difference both having?
Are static methods in utilities prone to non thread-safe? Lot’s of questions right?
Now every thing depends on how you implement method & which type of objects you are using in your method. Are you using thread safe objects? Are these objects / classes are mutable?
If you see in Example 1 arguments of concat method are of type String which are immutable and passed by value so this method is completely thread-safe.
Now in Example 2 arguments are of StringBuilder type which are mutable so other thread can change value of StringBuilder which makes this method is potentially non thread-safe.
Again this is not completely true. If you are calling this utility method with local variables then you never having any problem related to thread-safety. Because each thread uses it’s own copy for local variables so you never run into any thread safety issues. But that is beyond scope of above static method. It’s depend on calling function / program.
Now static methods in utility class are kind of normal practice. So how we can avoid it? If you see Example 2 I am modifying 1st parameter. Now if you want to make this method really thread safe then one simple thing you can do. Either use non-mutable variables / objects or do not change / modify any method parameters.
In Example 2 we already used StringBuilder which is mutable so you can change implementation to make static method thread safe as follows:
public static String concat1(StringBuilder sb, StringBuilder sb1) {
StringBuilder result = new StringBuilder();
result.append(sb);
result.append(sb1);
return result.toString();
}
Again going to basics always remember if you are using immutable objects & local variables then you are miles away from thread safety issues.
From the arcticle(https://nikhilsidhaye.wordpress.com/2016/07/29/is-static-method-in-util-class-threadsafe/)Thank you Nikhil Sidhaye for this simple article

Can an object's reference be set before its constructor finishes?

JMM (Java Memory Model) is free to reorder statements.
Of course, this is especially tricky when dealing with multithreading environment.
JMM rules precised that volatile and final variables are always fully initialized before constructor finishes and if and only if reference hasn't "escape" from within constructor.
It implies that "normal" variables (non-final and non-volatile) aren't expected to be seen up-to-date by any concurrent threads.
My question might seem stupid at first glance, but it really doesn't:
Are any object's references set AFTER constructor completes (completes not meaning with initialization of all variables already made, but simply reaching the end of the 'constructor' process)? Is there a rule in any JSR asserting it?
Or might it exist an exceptional case where any reference could be sent back to client BEFORE constructor completes?
Indeed, if statements reordering is reputed so free, it may also imply the sending of the object's reference 'happens-before' constructor completes. So, we'd come across the same case of the "this escape" to avoid.
To put it in a nutshell, is reference ALWAYS be sent after constructor completes?
After searching into the JLS: the only place where returning of object's reference is related is: (excerpt of JSR-12.5)
Just before a reference to the newly created object is returned as the
result, the indicated constructor is processed to initialize the new
object using the following procedure:
No relation to JMM ... therefore it can be ensured that constructor completion always happens-before passing reference whatever the case.

Within the context of a thread the reference will be set. However, the JMM allows shared variables to be set in one thread and not yet synchronised to the other thread.
Volatile and final guarantee this by guaranteeing inter-thread synchronisation of reads and writes to the variable.

Immutable objects are thread safe, but why?

Lets say for example, a thread is creating and populating the reference variable of an immutable class by creating its object and another thread kicks in before the first one completes and creates another object of the immutable class, won't the immutable class usage be thread unsafe?
Creating an immutable object also means that all fields has to be marked as final.
it may be necessary to ensure correct behavior if a reference to
a newly created instance is passed from one thread to another without
synchronization
Are they trying to say that the other thread may re-point the reference variable to some other object of the immutable class and that way the threads will be pointing to different objects leaving the state inconsistent?

Actually immutable objects are always thread-safe, but its references may not be.
Confused?? you shouldn't be:-
Going back to basic:
Thread-safe simply means that two or more threads must work in coordination on the shared resource or object. They shouldn't over-ride the changes done by any other thread.
Now String is an immutable class, whenever a thread tries to change it, it simply end up creating a new object. So simply even the same thread can't make any changes to the original object & talking about the other thread would be like going to Sun but the catch here is that generally we use the same old reference to point that newly created object.
When we do code, we evaluate any change in object with the reference only.
Statement 1:
String str = "123"; // initially string shared to two threads
Statement 2:
str = str+"FirstThread"; // to be executed by thread one
Statement 3:
str=str+"SecondThread"; // to be executed by thread two
Now since there is no synchronize, volatile or final keywords to tell compiler to skip using its intelligence for optimization (any reordering or caching things), this code can be run in following manner.
Load Statement2, so str = "123"+"FirstThread"
Load Statement3, so str = "123"+"SecondThread"
Store Statement3, so str = "123SecondThread"
Store Statement2, so str = "123FirstThread"
and finally the value in reference str="123FirstThread" and for sometime if we assume that luckily our GC thread is sleeping, that our immutable objects still exist untouched in our string pool.
So, Immutable objects are always thread-safe, but their references may not be. To make their references thread-safe, we may need to access them from synchronized blocks/methods.

In addition to other answers posted already, immutable objects once created, they cannot be modified further. Hence they are essentially read-only.
And as we all know, read-only things are always thread-safe. Even in databases, multiple queries can read same rows simultaneously, but if you want to modify something, you need exclusive lock for that.

Immutable objects are thread safe, but why?
An immutable object is an object that is no longer modified once it has been constructed. If in addition, the immutable object is only made accessible to other thread after it has been constructed, and this is done using proper synchronization, all threads will see the same valid state of the object.
If one thread is creating populating the reference variable of the immutable class by creating its object and at the second time the other thread kicks in before the first thread completes and creates another object of the immutable class, won't the immutable class usage be thread unsafe?
No. What makes you think so? An object's thread safety is completely unaffected by what you do to other objects of the same class.
Are they trying to say that the other thread may re-point the reference variable to some other object of the immutable class and that way the threads will be pointing to different objects leaving the state inconsistent?
They are trying to say that whenever you pass something from one thread to another, even if it is just a reference to an immutable object, you need to synchronize the threads. (For instance, if you pass the reference from one thread to another by storing it in an object or a static field, that object or field is accessed by several threads, and must be thread-safe)

Thread safety is data sharing safety, And because in your code you make decisions based on the data your objects hold, the integrity and deterministic behaviour of it is vital. i.e
Imagine we have a shared boolean instance variable across two threads that are about to execute a method with the following logic
If flag is false, then I print "false" and then I set the flag back to true.
If flag is true, then I print "true" and then I set the flag back to false.
If you run continuously in a single thread loop, you will have a deterministic output which will look like:
false - true - false - true - false - true - false ...
But, if you ran the same code with two threads, then, the output of your output is not deterministic anymore, the reason is that the thread A can wake up, read the flag, see that is false, but before it can do anything, thread B wakes up and reads the flag, which is also false!! So both will print false... And this is only one problematic scenario I can think of... As you can see, this is bad.
If you take out the updates of the equation the problem is gone, just because you are eliminating all the risks associated with data sync. that's why we say that immutable objects are thread safe.
It is important to note though, that immutable objects are not always the solution, you may have a case of data that you need to share among different threads, in this cases there are many techniques that go beyond the plain synchronization and that can make a whole lot of difference in the performance of your application, but this is a complete different subject.
Immutable objects are important to guarantee that the areas of the application that we are sure that don't need to be updated, are not updated, so we know for sure that we are not going to have multithreading issues
You probably might be interested in taking a look at a couple of books:
This is the most popular: http://www.amazon.co.uk/Java-Concurrency-Practice-Brian-Goetz/dp/0321349601/ref=sr_1_1?ie=UTF8&qid=1329352696&sr=8-1
But I personally prefer this one: http://www.amazon.co.uk/Concurrency-State-Models-Java-Programs/dp/0470093552/ref=sr_1_3?ie=UTF8&qid=1329352696&sr=8-3
Be aware that multithreading is probably the trickiest aspect of any application!

Immutability doesn't imply thread safety.In the sense, the reference to an immutable object can be altered, even after it is created.
//No setters provided
class ImmutableValue
{
private final int value = 0;
public ImmutableValue(int value)
{
this.value = value;
}
public int getValue()
{
return value;
}
}
public class ImmutableValueUser{
private ImmutableValue currentValue = null;//currentValue reference can be changed even after the referred underlying ImmutableValue object has been constructed.
public ImmutableValue getValue(){
return currentValue;
}
public void setValue(ImmutableValue newValue){
this.currentValue = newValue;
}
}

Two threads will not be creating the same object, so no problem there.
With regards to 'it may be necessary to ensure...', what they are saying is that if you DON'T make all fields final, you will have to ensure correct behavior yourself.

Exceptions in constructors

In C++, the lifetime of an object begins when the constructor finishes successfully. Inside the constructor, the object does not exist yet.
Q: What does emitting an exception from a constructor mean?
A: It means that construction has failed, the object never existed, its lifetime never began. [source]
My question is: Does the same hold true for Java? What happens, for example, if I hand this to another object, and then my constructor fails?
Foo()
{
Bar.remember(this);
throw new IllegalStateException();
}
Is this well-defined? Does Bar now have a reference to a non-object?

The object exists, but it's not been initialized properly.
This can happen whenever this leaks during construction (not just when you throw an exception).
It's a very problematic situation, because some commonly assumed guarantees don't hold true in this situation (for example final fields could seem to change their value during construction).
Therefore you should definitely avoid leaking this in the constructor.
This IBM developerWorks article describes the precautions to take when constructing objects and the reasoning behind those precautions. While the article discusses the subject in the light of multi-threading, you can have similar problems in a single-threaded environment when unknown/untrusted code gets a reference to this during construction.

You should never open resources like a file writer in your constructor. Create a init method instead and do it from there. Then you're safe.

This code is not exception safe and neither would be exception safe in C++. It's same bug regardless of the language you use.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.