Making a scheduled method Thread safe

Making a scheduled method Thread safe - java

I have a method that is invoked by a scheduler every minute to get a file from ftp, process and persists its records to a DB. I need to make this thread safe so that if the method has to perform multiple files at once, it acts a in a thread safe way..
public synchronized void processData(String data){
//do processing
}
is this really going to be a thread safe method that will handle high volumes of load gracefully?

It's thread-safe as long as it doesn't use any stateful fields from the enclosing object.
In other words, if there is a class-level field that is manipulated or accessed in processData(String data) with the intention of keeping track of what's going on, then it's not thread-safe.
An example might be a class-level field called private Boolean hasConnection; If you need to check whether or not a connection exists with this field, then you don't have a thread-safe method.
If you meet this requirement, then you don't even have to add the synchronized keyword to your method. It will be, by default, thread-safe, and an unlimited number of threads may access it simultaneously.
If you do not meet this requirement, then you will need to post the whole class in order to determine whether or not it is thread-safe.

Assuming that the mysterious "process the file" operation is self-contained, the biggest thing you should worry about is your DB connection: do not make it shared, obtain a new one each time from a connection string, and use a connection pool. Do not make your method synchronized, unless you need to access shared state inside your class; otherwise, your method would not be able to make progress concurrently on multiple threads.

Please describe us what resources your method uses, and which of those resources are shared.
If you do not use common object, there is no problem.
If you do use common resources, you need to make sure these resources can be accessed in a thread-safe manner, or are not accessed by multiple threads.
Your question is about performance. In general, processData seems to be a method which will take some time to complete: you are using databases. The time required to get a lock is minimal compared to a DB Query. So no, the synchronized keyword will not give you any noticeable performance impact.

Related

What is the use of ThreadLocal?

What is the use of ThreadLocal when a Thread normally works on variable keeping it in its local cache ?
Which means thread1 do not know the value of same var in thread2 even if no ThreadLocal is used .

With multiple threads, although you have to do work to make sure you read the "most recent" value of a variable, you expect there to be effectively one variable per instance (assuming we're talking about instance fields here). You might read an out of date value unless you're careful, but basically you've got one variable.
With ThreadLocal, you're explicitly wanting to have one value per thread that reads the variable. That's typically for the sake of context. For example, a web server with some authentication layer might set a thread-local variable early in request handling so that any code within the execution of that request can access the authentication details, without needing any explicit reference to a context object. So long as all the handling is done on the one thread, and that's the only thing that thread does, you're fine.

A thread doesn't have to keep variables in its local cache -- it's just that it's allowed to, unless you tell it otherwise.
So:
If you want to force a thread to share its state with other threads, you have to use synchronization of some sort (including synchronized blocks, volatile variables, etc).
If you want to prevent a thread from sharing its state with other threads, you have to use ThreadLocal (assuming the object that holds the variable is known to multiple threads -- if it's not, then everything is thread-local anyway!).

It's kind of a global variable for the thread itself, so that any code running in the thread can access it directly. (A "really" global variable can be accessed by any code running in the "process"; we could call it ProcessLocal:)
Is global variable bad? Maybe; it should be avoided if we can. But sometimes we have no choice, we cannot pass the object through method parameters, and ThreadLocal proves to be useful in many designs without causing too much trouble.

Use of ThreadLocal is when an object is not thread-safe, but you want to avoid synchronizing access. So each thread stores data on its own Thread local storage memory. By default, data is shared between threads.

private lock object and intrinsic lock

When to prefer private lock object to synchronize a block over intrinsic lock(this)?
Please cite the upshots of both.
private lock object:-
Object lock =new Object();
synchronized(lock)
{ }
intrinsic lock(this):-
synchronized(this)
{ }

Using explicit lock objects can allow different methods to synchronize on different locks and avoid unnecessary contention. It also makes the lock more explicit, and can make it easier to search the code for blocks that use the lock.
You probably don't want to do either, however! Find the appropriate class in java.util.concurrent and use that instead. :)

A private lock can be useful if you are doing some kind of lock sharding, i.e., you need to only lock certain parts of your object while others can still be accessed by a different client.
One simple parallel to understand this concept is a table lock in a database: if you are modifying one table, you acquire the lock on that single table, not the whole database, so the rest of the tables can be modified by other clients. If you need to implement a similar logic but in a POJO you would use as many private locks as necessary.
One downside of this approach is that your class gets cluttered with a lot of objects. This might be indication that you need to refactor it in a more granular set of classes with a simpler locking strategy but it all depends on your design and implementation.

These are both using intrinsic locks. Your first example is using the intrinsic lock of lock, while the second is using the intrinsic lock of this. The question is whether or not this is really what you want to lock on, which it often isn't.
Consider the case, when you use synchronized(this) inside one of your methods. You have 2 objects of this class, and these objects reference some shared resource. If you lock on this then you will not have mutual exclusivity to that resource. You need to lock on some object that everything that can access the resource has access to.
Lock on this ONLY if the important resource is part of the class itself. Even then in some cases a lock object is better. Also, if there's several different resources in your class, that do not need to be mutually exclusive as a whole, but individually, then you need several lock objects.
The key is to really just know how synchronized works, and be mindful of what your code is actually doing

Actually, using either won't make any difference, it is more about choice/style, API writers will lock on the Object -either by synchronized(this) or explicit synchronized on any Object method-, or use an internal monitor depends on sharing a resource, you might not want API users to access your internal lock or you might want to give the choice to API users to share the Object intrinsic lock.
Either way none of those choices are wrong, it is more about the intention of such lock.
Read Java Concurrency in Practice, that will make you a master of concurrency and clarify many of those concepts, which sometimes are more related with the choice you make rather than correctness.

Each object has only one intrinsic lock.
With the synchronized keyword: if you call two synchronized methods from the same object from two different threads, even-thought one thread could run method one and the other thread could run method two, that will not happen because both methods share the same intrinsic lock (which belongs to the object). And according to that one thread will have to wait for the other thread to finish before it can acquire the intrinsic lock to run the other method.
But if you use multiple locks, you will make sure that only one thread can access method one at a time and that only one thread can access method two at a time. But you will allow that method one and method two can be accessed by one thread each at the same time and then reducing the time required for the operation.

Performance issue: use Singleton object in multi thread environment

I have a class "A" with method "calculate()". Class A is of type singleton(Scope=Singleton).
public class A{
public void calculate(){
//perform some calculation and update DB
}
}
Now, I have a program that creates 20 thread. All threads need to access the method "calculate()".
I have multicore system. So I want the parallel processing of the threads.
In the above scenario, can i get performance? Can all threads access the method calculate at same instance of time?
Or, Since the class A is singleton so, the threads needs to be blocked waiting.
I have found similar questions in the web/Stackoverflow. But I cannot get clear answer.
Would you please help me?

Statements like "singletons need synchronization" or "singletons don't need synchronization" are overly simplistic, I'm afraid. No conclusions can be drawn only from the fact that you're dealing with the singleton pattern.
What really matters for purposes of multithreading is what is shared. If there are data that are shared by all threads performing the calculation, then you will probably need to synchronize that access. If there are critical sections of code than cannot run simultaneously between threads, then you will need to synchronize that.
The good news is that often times it will not be necessary to synchronize everything in the entire calculation. You might gain significant performance improvements from your multi-core system despite needing to synchronize part of the operation.
The bad news is that these things are very complex. Sorry. One possible reference:
http://www.amazon.com/Java-Concurrency-Practice-Brian-Goetz/dp/0321349601/ref=sr_1_1?ie=UTF8&qid=1370838949&sr=8-1&keywords=java+concurrency+in+practice

That's the fundamental concept of Singleton. Only one instance of the class would be present in the system (JVM). Now, it depends on the implementation of calculate(). Is it a stateless utility method? If yes, you might not want to make it synchronized. In that case, multiple threads will be able to access it at the same instance of time. If calculate() is NOT stateless, i.e. it uses instance variables (and those instance variables will be used by multiple threads), then be careful; You have to make calculate() thread safe. You have to synchronize the method. At least you have to use a synchronize block inside the method. But, once you do so, only one thread will be able to access it (the synchronized block or the synchronized block inside the method) at any point of time.
public void calculate() {
//Some code goes here which does not need require thread safety.
synchronized(someObj) {
//Some code goes here which requires thread safety.
}
//Some code goes here which does not need require thread safety.
}
If you want to use parallel processing (if that's the primary goal), then singleton is not the design pattern that you should use.

I have found similar questions in the web/Stackoverflow. But I cannot get clear answer.
There is a good reason for that!!
It is not possible to say whether a method on a singleton does, or does not, need to be synchronized by virtue of being singleton.
Synchronization and the need for synchronization is all about state that may be shared by different threads.
If different threads share state (even serially), then synchronization is required.
If not then no synchronization is required.
The only clues that you have provided us that would help us give you a yes / no answer are this enigmatic comment:
// perform some calculation and update DB
... and the fact that the calculate() method takes no arguments.
If we infer that the calculate() method gets its input from the state of the singleton itself, then at least the part of the method (or the methods it calls) must synchronize while retrieving that state. However, that doesn't mean that the entire method call must be synchronized. The proportion of its time that the calculate method needs to hold a lock on the shared data will determine how much parallelism you can actually get ...
The updating of the database will also require some kind of synchronization. However, this should be taken care of by the JDBC connection object and the objects you get from it ... provided that you obey the rules and don't try to share a connection between multiple threads. (The database update will also present a concurrency bottleneck ... assuming that the updates apply to the same database table or tables.)

It depends on how you implement Singleton. If you use Synchronized keyword then they will wait else not.
Use Singleton with eager initialization.
Something like this:
public final class Universe {
public static Universe getInstance() {
return fINSTANCE;
}
// PRIVATE //
/**
* Single instance created upon class loading.
*/
private static final Universe fINSTANCE = new Universe();
/**
* Private constructor prevents construction outside this class.
*/
private Universe() {
//..elided
}
}
Above will perform very well in multithreaded environment. or else you can go for enum implementation of Singleton.
Check this link for various singleton implementation: http://javarevisited.blogspot.in/2012/07/why-enum-singleton-are-better-in-java.html

Multiple threads can invoke calculate() at the same time.
Those invocations won't be queued (executed serially) within that JVM unless you perform some type of concurrency control (making the method synchronized is one option).
The fact that your object is a singleton may or may not affect performance, depending on how that object's attributes (if any) are used within calculate().
Also bear in mind that since you are "updating DB", table or row level locks may also limit concurrency.
If you are worried about performance, the best bet is to test it.

Thread safe method and stack

StringBuffer class having methods which are thread safe? OK but i have question that when the particular method will be called then it will be loaded on to stack and stack is thread safe so why we need the thread safe method?

It's quite possible to share a given StringBuffer instance across different threads in which case multiple threads will end up "modifying" or mutating the StringBuffer's internal state. This is why it's required to explicitly synchronize append methods on a StringBuffer.
But you are right. If you don't plan on sharing stuff across thread boundaries (or like they call "publish" the instance), it is more logical to just create a StringBuilder instance (which is the non-synchronized brother of StringBuffer) in a given method call and throw it away (or more like let the GC take care of it) after the method call ends.
There is another aspect which comes into play when you absolutely have to share instances across threads and at the same time feel that the cost of synchronizing each operation is way too much -- thread locals. Basically, the idea in this case is to make each thread have its own copy of a "mutable" entity. There is no locking required because the moment some other thread tries to access a thread local variable, you hand across a fresh/pre-configured instance. This is commonly used for stuff like sharing StringBuilder and DateFormat instances to boost performance.
If you want to compare between raw/unsafe sharing of a mutable object between threads versus using a thread local, take a look at the snippet I have hosted on Bitbucket.

Synchronization, When to or not to use?

I have started learning concurrency and threads in Java. I know the basics of synchronized (i.e. what it does). Conceptually I understand that it provides mutually exclusive access to a shared resource with multiple threads in Java. But when faced with an example like the one below I am confused about whether it is a good idea to have it synchronized. I know that critical sections of the code should be synchronized and this keyword should not be overused or it effects the performance.
public static synchronized List<AClass> sortA(AClass[] aArray)
{
List<AClass> aObj = getList(aArray);
Collections.sort(aObj, new AComparator());
return aObj;
}
public static synchronized List<AClass> getList(AClass[] anArray)
{
//It converts an array to a list and returns
}

Assuming each thread passes a different array then no synchronization is needed, because the rest of the variables are local.
If instead you fire off a few threads all calling sortA and passing a reference to the same array, you'd be in trouble without synchronized, because they would interfere with eachother.
Beware, that it would seem from the example that the getList method returns a new List from an array, such that even if the threads pass the same array, you get different List objects. This is misleading. For example, using Arrays.asList creates a List backed by the given array, but the javadoc clearly states that Changes to the returned list "write through" to the array. so be careful about this.

Synchronization is usually needed when you are sharing data between multiple invocations and there is a possibility that the data would be modified resulting in inconsistency. If the data is read-only then you dont need to synchronize.
In the code snippet above, there is no data that is being shared. The methods work on the input provided and return the output. If multiple threads invoke one of your method, each invocation will have its own input and output. Hence, there is no chance of in-consistency anywhere. So, your methods in the above snippet need not be synchornized.
Synchronisation, if unnecessarily used, would sure degrade the performance due to the overheads involved and hence should be cautiously used only when required.

Your static methods don't depend on any shared state, so need not be synchronized.

There is no rule defined like when to use synchronized and when not, when you are sure that your code will not be accessed by concurrent threads then you can avoid using synchronised.

Synchronization as you have correctly figured has an impact on the throughput of your application, and can also lead to starving thread.
All get basically should be non blocking as Collections under concurrency package have implemented.
As in your example all calling thread will pass there own copy of array, getList doesn't need to be synchronized so is sortA method as all other variables are local.
Local variables live on stack and every thread has its own stack so other threads cannot interfere with it.
You need synchronization when you change the state of the Object that other threads should see in an consistent state, if your calls don't change the state of the object you don't need synchronization.

I wouldn't use synchronized on single threaded code. i.e. where there is no chance an object will be accessed by multiple threads.
This may appear obvious but ~99% of StringBuffer used in the JDK can only be used by one thread can be replaced with a StringBuilder (which is not synchronized)

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.