I am trying to parallelize a bit of code which makes use of static fields within a "Constants" class. The code at the moment essentially looks like this
public class myClass{
public class Constants{
public static int constant;
}
public static void main(String[] args){
for(int i = 0 ; i<10 ; i++){
Constants.constant = i;
System.out.println(Constants.constant/2);
}
}
}
Obviously the code within the loop is much more heavily dependent on the Constant class, which itself is much more complex. What I'd like to do is create a thread for each iteration of the loop and do said computations separately, all the while controlling the number of threads (right now I'm using a simple semaphore).
Now obviously in the above code, the Constants class is shared between threads and thus cannot be updated by each thread without being updated for all of them.
So my question is : is there anyway to make my Constants class be able to have an instance for each thread, all the while being able to access its fields in a static manner ?
What you're describing is a thead-local: http://docs.oracle.com/javase/7/docs/api/java/lang/ThreadLocal.html . It's a good thing to use. However, as Affe points out, you can't use that with your code as is, because there's just one instance of the class and its static members (per classloader). If your Constants class is something that you can build several several copies of in parallel and then merge them together later, you should make Constants.constant an instance variable by removing "static". Then create a thread-local in myClass like so:
private ThreadLocal<Constants> constants = new ThreadLocal<Constants> {
#Override protected Integer initialValue() {
return nextId.getAndIncrement();
}
}
Once your threads are done updating their local object, they can stick them into a shared ArrayBlockingQueue. Your main thread can dequeue them all and merge them as you desire.
Another thing to note is that you may want to use a thread pool executor instead of one thread per iteration of the loop, if you will have a variable number of iterations, possibly many, but you don't want that many threads. (Thread creation is costly and many concurrent threads eat memory and OS scheduling resources.)
Related
Here is a question that has been asked many times, I have double-checked numerous issues that have been raised formerly but none gave me an answer element so I thought I would put it here.
The question is about making my code thread-safe in java knowing that there is only one shared variable but it can change anytime and actually I have the feeling that the code I am optimizing has not been thought for a multi-threading environment, so I might have to think it over...
Basically, I have one class which can be shared between, say, 5 threads. This class has a private property 'myProperty' which can take 5 different values (one for each thread). The problem is that, once it's instantiated by the constructor, that value should not be changed anymore for the rest of the thread's life.
I am pretty well aware of some techniques used to turn most of pieces of code "thead-safe" including locks, the "synchronized" keyword, volatile variables and atomic types but I have the feeling that these won't help in the current situation as they do not prevent the variable from being modified.
Here is the code :
// The thread that calls for the class containing the shared variable //
public class myThread implements Runnable {
#Autowired
private Shared myProperty;
//some code
}
// The class containing the shared variable //
public class Shared {
private String operator;
private Lock lock = new ReentrantLock();
public void inititiate(){
this.lock.lock()
try{
this.operator.initiate() // Gets a different value depending on the calling thread
} finally {
this.lock.unlock();
}
}
// some code
}
As it happens, the above code only guarantees that two threads won't change the variable at the same time, but the latter will still change. A "naive" workaround would consist in creating a table (operatorList) for instance (or a list, a map, etc. ) associating an operator with its calling thread's ID, this way each thread would just have to access its operator using its id in the table but doing this would make us change all the thread classes which access the shared variable and there are many. Any idea as to how I could store the different operator string values in an exclusive manner for each calling thread with minimal changes (without using magic) ?
I'm not 100% sure I understood your question correctly, but I'll give it a shot anyway. Correct me if I'm wrong.
A "naive" workaround would consist in creating a table (operatorList)
for instance (or a list, a map, etc. ) associating an operator with
its calling thread's ID, this way each thread would just have to
access its operator using its id in the table but doing this would
make us change all the thread classes which access the shared variable
and there are many.
There's already something similar in Java - the ThreadLocal class?
You can create a thread-local copy of any object:
private static final ThreadLocal<MyObject> operator =
new ThreadLocal<MyObject>() {
#Override
protected MyObject initialValue() {
// return thread-local copy of the "MyObject"
}
};
Later in your code, when a specific thread needs to get its own local copy, all it needs to do is: operator.get(). In reality, the implementation of ThreadLocal is similar to what you've described - a Map of ThreadLocal values for each Thread. Only the Map is not static, and is actually tied to the specific thread. This way, when a thread dies, it takes its ThreadLocal variables with it.
I'm not sure if I totally understand the situation, but if you want to ensure that each thread uses a thread-specific instance for a variable, the solution is use a variable of type ThreadLocal<T>.
I have several threads trying to increment a counter for a certain key in a not thread-safe custom data structure (which you can image to be similiar to a HashMap). I was wondering what the right way to increment the counter in this case would be.
Is it sufficient to synchronize the increment function or do I also need to synchronize the get operation?
public class Example {
private MyDataStructure<Key, Integer> datastructure = new CustomDataStructure<Key, Integer>();
private class MyThread implements Runnable() {
private synchronized void incrementCnt(Key key) {
// from the datastructure documentation: if a value already exists for the given key, the
// previous value will be replaced by this value
datastructure.put(key, getCnt(key)+1);
// or can I do it without using the getCnt() function? like this:
datastructure.put(key, datastructure.get(key)+1));
}
private synchronized int getCnt(Key key) {
return datastructure.get(key);
}
// run method...
}
}
If I have two threads t1, t2 for example, I would to something like:
t1.incrementCnt();
t2.incrmentCnt();
Can this lead to any kind of deadlock? Is there a better way to solve this?
Main issue with this code is that it's likely to fail in providing synchronization access to datastructure, since accessing code synchronizing on this of an inner class. Which is different for different instances of MyThread, so no mutual exclusion will happen.
More correct way is to make datastructure a final field, and then to synchronize on it:
private final MyDataStructure<Key, Integer> datastructure = new CustomDataStructure<Key, Integer>();
private class MyThread implements Runnable() {
private void incrementCnt(Key key) {
synchronized (datastructure) {
// or can I do it without using the getCnt() function? like this:
datastructure.put(key, datastructure.get(key)+1));
}
}
As long as all data access is done using synchronized (datastructure), code is thread-safe and it's safe to just use datastructure.get(...). There should be no dead-locks, since deadlocks can occur only when there's more than one lock to compete for.
As the other answer told you, you should synchronize on your data structure, rather than on the thread/runnable object. It is a common mistake to try to use synchronized methods in the thread or runnable object. Synchronization locks are instance-based, not class-based (unless the method is static), and when you are running multiple threads, this means that there are actually multiple thread instances.
It's less clear-cut about Runnables: you could be using a single instance of your Runnable class with several threads. So in principle you could synchronize on it. But I still think it's bad form because in the future you may want to create more than one instance of it, and get a really nasty bug.
So the general best practice is to synchronize on the actual item that you are accessing.
Furthermore, the design conundrum of whether or not to use two methods should be solved by moving the whole thing into the data structure itself, if you can do so (if the class source is under your control). This is an operation that is confined to the data structure and applies only to it, and doing the increment outside of it is not good encapsulation. If your data structure exposes a synchronized incrementCnt method, then:
It synchronizes on itself, which is what you wanted.
It can use its own private fields directly, which means you don't actually need to call a getter and a setter.
It is free to have the implementation changed to one of the atomic structures in the future if it becomes possible, or add other implementation details (such as logging increment operations separately from setter access operations).
public class ThreadTest implements Runnable {
private int counter;
private Date mydate = new Date();
public void upCounter1() {
synchronized (mydate ) {
for (int i = 0; i < 5; i++) {
counter++;
System.out.println("1 " + counter);
}
}
}
public void upCounter2() {
synchronized (mydate ) {
for (int i = 0; i < 5; i++) {
counter++;
System.out.println("2 " + counter);
}
}
}
public void upCounter3() {
synchronized (mydate ) {
for (int i = 0; i < 5; i++) {
counter++;
System.out.println("3 " + counter);
}
}
}
#Override
public void run() {
upCounter1();
upCounter2();
upCounter3();
}
public static void main(String[] args) {
Threadtest mtt = new Threadtest();
Thread t1 = new Thread(mtt);
Thread t2 = new Thread(mtt);
Thread t3 = new Thread(mtt);
t1.start();
t2.start();
t3.start();
}
}
I tried this code with various synchronisation techniques and I'd like to make sure I get what's happening. I've read a bunch of articles on this, but none of them broke it down enough for me.
So here's what I observed:
synchronised (this): This works only, if I give the SAME instance of Threadtest to all threads, because if I give each thread its own instance, each will get that instance's intrinsic lock and can access the methods without interruption from the other threads.
However, if I give each thread its own instance, I can do: synchronised (getClass()), because then I get the instrinsic lock of the class
Alternatively, I could do: synchronised (mydate), where the same rules apply that apply to synchronised (this). But it has the advantage of not being public. > I dont really understand this. What is the "danger" of using this?
Alternatively to synchronised (getClass()), I could also use a private static field.
However, I cannot do synchronised(Date.class).
I could synchronise the entire methods (same effecte as with synchronised-block)
making counter volatile doesn't work, because incrementing isn't a truly atomic operation
If I want to make each method accessible individually, I would make three private fields and use them in the synchronised-blocks. I then am effectively using the intrinsic locks of those fields and not of my class or instance.
I also noted that when I use the class-lock, each method is viewed as separate and I have effectively 3 ounters that go to 15. If I use the instance lock, the counter goes to 45. Is that the correct and expected behaviour?
Are my explanations and observations correct? (I basically want to make sure I draw the correct conclusions form the console output I got)
a-c; e-f are correct.
c) Alternatively, I could do: synchronised (mydate), where the same rules apply that apply to synchronised (this). But it has the advantage of not being public. > I dont really understand this. What is the "danger" of using this?
The argument is that other code may also decide to use that object as a lock. Which could cause conflict; when you know that this can never be the case then it is not such an evil thing. It is also usually more of a problem when one uses wait/notify in their code.
d) Alternatively to synchronised (getClass()), I could also use a private static field. However, I cannot do synchronised(Date.class).
You can use Date.class, it would just be a bit weird and falls into the argument discussed in c above about not polluting other classes work spaces.
g) If I want to make each method accessible individually, I would make three private fields and use them in the synchronised-blocks. I then am effectively using the intrinsic locks of those fields and not of my class or instance.
Given that the three methods share the same state, then no, this would not be wise as it would lead to races between the threads.
h) I also noted that when I use the class-lock, each method is viewed as separate and I have effectively 3 counters that go to 15. If I use the instance lock, the counter goes to 45. Is that the correct and expected behaviour?
No, this sounds wrong but I may have misunderstood you. I would expect the total to be 45 in both cases when using either this or this.getClass() as the lock.
Your code is threadsafe as it stands, if slow (you are writing to the console while holding a lock) - but better correct and slow than wrong and fast!
a) synchronised (this): This works only, if I give the SAME instance of Threadtest to all threads, because if I give each thread its own instance, each will get that instance's intrinsic lock and can access the methods without interruption from the other threads.
Your code is threadsafe either case - that is, it will give the exact same results every time. If you pass the same instance to three different threads the final line of output will be "3 45" (since there is only one counter variable) and if you give each thread its own instance there will be three lines reading "3 15". It sounds to me like you understand this.
b) However, if I give each thread its own instance, I can do: synchronised (getClass()), because then I get the instrinsic lock of the class
If you do this your code is still threadsafe, but you will get three lines reading "3 15" as above. Be aware that you will also be more prone to liveness and deadlock issues for the reason stated below.
c) Alternatively, I could do: synchronised (mydate), where the same rules apply that apply to synchronised (this). But it has the advantage of not being public. I dont really understand this. What is the "danger" of using this?
You should try to use private locks where you can. If you use a globally-visible object (e.g. this or getClass or a field with visibility other than private or an interned String or an object that you got from a factory) then you open up the possibility that some other code will also try to lock on the object that you are locking on. You may end up waiting longer than you expect to acquire the lock (liveness issue) or even in a deadlock situation.
For a detailed analysis of things that can go wrong, see the secure coding guidelines for Java - but note that this is not just a security issue.
d) Alternatively to synchronised (getClass()), I could also use a private static field. However, I cannot do synchronised(Date.class).
A private static field is preferable to either getClass() or Date.class for the reasons stated above.
e) I could synchronise the entire methods (same effecte as with synchronised-block)
Pretty much (there are currently some insignificant byte code differences), but again you should prefer private locks.
f) making counter volatile doesn't work, because incrementing isn't a truly atomic operation
Yes, you may run into a race condition and your code is no longer threadsafe (although you don't have the visibility issue mentioned below)
g) If I want to make each method accessible individually, I would make three private fields and use them in the synchronised-blocks. I then am effectively using the intrinsic locks of those fields and not of my class or instance.
You should not do this, you should always use the same lock to access a variable. As well as the fact that you could have multiple threads reading/writing to the same variable at the same time giving race condition you also have a subtler issue to do with inter-thread visibility. The Java Memory Model guarantees that writes done by one thread before a lock is released will be seen another thread when that other thread acquires the same lock. So thread 2 executing upCounter2 may or may not see the results of thread 1 executing upCounter1.
Rather than thinking of "which blocks of code do I need to execute?" you should think "which pieces of state do I need to access?".
h) I also noted that when I use the class-lock, each method is viewed as separate and I have effectively 3 ounters that go to 15. If I use the instance lock, the counter goes to 45. Is that the correct and expected behaviour?
Yes, but it has nothing to do with the object you are using for synchronisation, rather it's because you have created three different ThreadTest objects and hence have three different counters, as I explained in my answer to your first question.
Make sure that you understand the difference between three threads operating on one object and one thread operating on three different objects. Then you will be able to understand the behaviour you are observing with three threads operating on three different objects.
a) Correct
b) Correct
c) There could be some other bunch of code using your this or class in another part of your application where your class is accessible. This will mean that unrelated code will be waiting for each other to complete.
d) You cannot do synchronisation on Date.class because of the same reason above. There may be unrelated threaded methods waiting for each other unnecessarily.
e) Method synchronisation is same as class lock
g) Correct
I have a utility class in Java which is accessing a big file system to access a file.
Some files are huge so whats happening is that the Utility class is talking a lot of time to access these files and i am facing a performance issue here.
I plan to implement Multithreading to improve performance but i am bit confused as to how i need to do that. below is the structure of the Utility class.
public class Utility {
public static void Method1(ArrayList values){
//do some processing
for(int i=0; i< values.size();i++){
ArrayList<String> details= MethodAccessFileSystem();
CreateFileInDir(details);
}
}
public ArrayList<String> MethodAccessFileSystem(){
//Code to access the file system. This is taking hell lot of time.
}
public void CreateFileInDir(ArrayList<String> values){
//Do some processing here.
}
}
I used to call this Utilty class in a standalone class using the following syntax
Utility.Method1(values); //values is an ArayList.
Now i need to convert the above code into a Multithreaded code.
I know how to create a thread by extending Thread class or implementing a Runnable.
I have a basic idea about that.
But what i need to know is should i convert this whole Utilty class to implement Runnable.
or should parts of the Utilty class needs to seperated and made as Runnable task.
My issue is with the for() loop as these methods are called in loop.
if i separate out MethodAccessFileSystem() and make it as a task will this work.
If MethodAccessFileSystem() is taking a time then will the JVM automaticaly start another thread if i use a Threadpoolexecutor to schedule a fixed number of threads.
Should i need to suspend this method or it is not required or JVM will take care.
The main issue is with the For loop.
At the end what i need is that the Utility class should be Multithreaded and the call to method should be the same as the above.
Utility.Method1(values); //values is an ArayList.
I am thinking as to how i can implement that.
Can you please help me with this and provide your suggestions and feedback on the design changes that need to be made.
Thanks
Vikeng
From your class According to me the chunk of work which fits in Parallelism principle is below loop.
// do some processing
for (int i = 0; i < values.size(); i++) {
new Thread(new Runnable() {
#Override
public void run() {
ArrayList<String> details = MethodAccessFileSystem();
CreateFileInDir(details);
}
});
}
Before you make the change make sure that multiple threads will help. Run the method and as best you can check CPU and disk i/o activity. Also check to see if there's any garbage collection going on.
If any of those conditions exist then adding threads really won't help. You'll have to address that specific condition in order to get any throughput improvements.
Having said that the trick to making the code thread safe is to not have any instance variables on the class that are used to hold state during the method execution. For each existing instance variable, you need to decide whether to make it a local variable declared within the method or a method parameter.
I am struggling to understand on java threads work so excuse this rather simple question.
Let's assume I have a program with N threads. Each thread executes the same instructions on a different part of an array of Strings. We invoke the thread through a class with a runnable interface. For the purposes of this example, let say it is something like this:
run() {
while (startStop = loopGetRange() != null) {
countLetters(startStop.start,startStop.stop);
/* start is the beginning cell in the array where the process starts
and stop is the ending cell in the array where the process stops */
}
}
Finally countLetters is just a simple method as follow:
private void countLeters (int start, int stop) {
for (int y = start; <= stop; y++) {
String theWord = globalArray[y];
int z = theWord.length;
System.out.println("For word "+theWord+" there are "+z+" characters");
}
}
Here is my question: Are variables like "theWord" and "Z" local to the thread or are they shared across the thread and are thus subject to possible thread collisions. If the latter, how best to protect these variables.
Thanks for helping a confused person.
Elliott
Local variables are allocated on the stack, and are local to the thread. Only member fields are shared across threads. So, theWord and Z are not shared across threads and you don't need to worry about clashing.
Given that a String is immutable, the only concern about thread safety we would have in method countLeters() is access to the globalArray.
Now, if the construction of this array "happened-before" the access to globalArray, the code is safe as long as no thread "writes" to the globalArray.
"happened-before" relationships can be enforced by number of ways (by using the synchronized keyword, final keyword, volatile keyword, using java.util.concurrent libraries etc.).
The thread has no impact on what variables are visible. It would be just like you created the class and ran the method without starting a thread. If multiple threads will be accessing the same objects, then you have to look at using locks to make sure they don't step on each other.
Like JOTN says, if the threads are accessing the same objects, then there might be thread collisions.
If the globalArray variable in this case is shared across the threads, and especially if it or its elements are modified, then it might be wise to use locks/synchronization.
Aside from visibility of variables and lock/synchronization issues for shared variables...
Are variables like "theWord" and "Z" local to the thread
Those variables you ask about are local to the loop, not part of the class or instance, and exist on a per-thread basis, so there won't be any collisions.