I have a utility class in Java which is accessing a big file system to access a file.
Some files are huge so whats happening is that the Utility class is talking a lot of time to access these files and i am facing a performance issue here.
I plan to implement Multithreading to improve performance but i am bit confused as to how i need to do that. below is the structure of the Utility class.
public class Utility {
public static void Method1(ArrayList values){
//do some processing
for(int i=0; i< values.size();i++){
ArrayList<String> details= MethodAccessFileSystem();
CreateFileInDir(details);
}
}
public ArrayList<String> MethodAccessFileSystem(){
//Code to access the file system. This is taking hell lot of time.
}
public void CreateFileInDir(ArrayList<String> values){
//Do some processing here.
}
}
I used to call this Utilty class in a standalone class using the following syntax
Utility.Method1(values); //values is an ArayList.
Now i need to convert the above code into a Multithreaded code.
I know how to create a thread by extending Thread class or implementing a Runnable.
I have a basic idea about that.
But what i need to know is should i convert this whole Utilty class to implement Runnable.
or should parts of the Utilty class needs to seperated and made as Runnable task.
My issue is with the for() loop as these methods are called in loop.
if i separate out MethodAccessFileSystem() and make it as a task will this work.
If MethodAccessFileSystem() is taking a time then will the JVM automaticaly start another thread if i use a Threadpoolexecutor to schedule a fixed number of threads.
Should i need to suspend this method or it is not required or JVM will take care.
The main issue is with the For loop.
At the end what i need is that the Utility class should be Multithreaded and the call to method should be the same as the above.
Utility.Method1(values); //values is an ArayList.
I am thinking as to how i can implement that.
Can you please help me with this and provide your suggestions and feedback on the design changes that need to be made.
Thanks
Vikeng
From your class According to me the chunk of work which fits in Parallelism principle is below loop.
// do some processing
for (int i = 0; i < values.size(); i++) {
new Thread(new Runnable() {
#Override
public void run() {
ArrayList<String> details = MethodAccessFileSystem();
CreateFileInDir(details);
}
});
}
Before you make the change make sure that multiple threads will help. Run the method and as best you can check CPU and disk i/o activity. Also check to see if there's any garbage collection going on.
If any of those conditions exist then adding threads really won't help. You'll have to address that specific condition in order to get any throughput improvements.
Having said that the trick to making the code thread safe is to not have any instance variables on the class that are used to hold state during the method execution. For each existing instance variable, you need to decide whether to make it a local variable declared within the method or a method parameter.
Related
My program/class is getting a list of classes (e.g. C-1() through C-100()) that need to be run in parallel threads. Each one is its own Class and has its own executable so i don't need to compile, just run. While each class takes a parameter, the logic inside each can be very different. So no hope of launching one class with a parameter multiple times.
The list of classes is variable. There may be one class (C-3()) or multiple (C-1(),C-2(),C-4(),C-3()) and they may or may not be in any order.
I have used the bulk method with a loop and a switch statements but coding 100 of those seems unnecessarily complex and frankly just looks bad. But it works and worst case, will do the job. But it bothers me.
case ("C-1")
{
new C-1("parm").start();
}
etc .... x 100
the lambda functions might get me there but its outside my experience.
I didnt want to shell it out. That seems both inefficient and potentially a performance killer.
In a perfect world, I would dynamically pull the item from the list and launch it. But I cant figure out how to replace the objectname dynamically. I dont want to slow it down with any clever linking. My expertise isnt enough to tackle that one.
It would also have been nice to add something so that if the list is less than 10, it would run it in the same thread and only go massively parallel if it was above that. But thats also outside my expertise.
In a perfect world, I would dynamically pull the item from the list
and launch it. But I cant figure out how to replace the objectname
dynamically.
The Java subsystem and technique for this kind of dynamic operation is called "reflection". The java.lang.Class class plays a central role here, with most of the rest of the key classes coming from package java.lang.reflect. Reflection permits you to obtain the Class object for a class you identify by name, to create instances of that class, and to invoke methods on those instances.
If your C-* classes all have a common superclass or interface that defines the start() method (Thread?) then you could even perform normal method invocation instead of reflective.
Provided that all the classes you want to dynamically instantiate provide constructors that accept the same parameter type and to which you want to pass the same argument value, you can use it to save writing 100-way conditionals, or a hundred different adapter classes, or similar, for your case. Schematically, it would work along these lines:
obtain or create a fully-qualified class name for the wanted class, let's say className.
obtain the corresponding Class
Class<?> theClass = Class.forName(className);
Obtain a Constructor representing the constructor you want to use. In your example, the constructor takes a single parameter of a type compatible with String. If the declared parameter type is in fact String itself (as opposed to Object or Serializable, or ...) then that would be done like so:
Constructor<?> constructor = theClass.getConstructor(String.class);
Having that in hand, you can instantiate the class:
Object theInstance = constructor.newInstance("parm");
Your path from there depends on whether there is a common supertype, as mentioned above. If there is, then you can
Cast the instance and invoke the method on it normally:
((MySupertype) theInstance).start();
Otherwise, you'll need to invoke the method reflectively, too. This is somewhat simplified by the fact that the method of interest does not take any parameters:
Obtain a Method instance.
Method startMethod = theClass.getMethod("start");
Invoke the method on your object
startMethod.invoke(theInstance);
You also mention,
It would also have been nice to add something so that if the list is
less than 10, it would run it in the same thread and only go massively
parallel if it was above that.
None of the above has anything directly to do with starting new threads in which to run your code. If that's something that the start() methods will do themselves (for example, if the classes involved have java.lang.Thread as a superclass) then the only alternative for avoiding each object running on its own thread is to use a different method.
On the other hand, if you're starting from everything running in one thread and looking to parallelize, then using a thread pool as described in #PaulProgrammer's answer is a great way to go. Note well that if the tasks are independent of each other, as seems the case from your description, then there's not much point in trying to ensure that they all run concurrently. More threads than you have cores to run them on does not really help you, and a thread pool is useful for queueing up tasks for parallel execution. Of course it would be simple to check the size() of your list to decide whether to send tasks to a thread pool or to just run them directly.
The accepted best way to approach this problem is to use a ThreadPool. The idea is that you will spawn a known number of threads, and use those worker threads to work through a queue of tasks. The threads themselves can be reused, preventing the overhead of thread creation.
https://howtodoinjava.com/java/multi-threading/java-thread-pool-executor-example/
package com.howtodoinjava.threads;
import java.util.concurrent.Executors;
import java.util.concurrent.ThreadPoolExecutor;
public class ThreadPoolExample
{
public static void main(String[] args)
{
ThreadPoolExecutor executor = (ThreadPoolExecutor) Executors.newFixedThreadPool(2);
for (int i = 1; i <= 5; i++)
{
Task task = new Task("Task " + i);
System.out.println("Created : " + task.getName());
executor.execute(task);
}
executor.shutdown();
}
}
I am trying to parallelize a bit of code which makes use of static fields within a "Constants" class. The code at the moment essentially looks like this
public class myClass{
public class Constants{
public static int constant;
}
public static void main(String[] args){
for(int i = 0 ; i<10 ; i++){
Constants.constant = i;
System.out.println(Constants.constant/2);
}
}
}
Obviously the code within the loop is much more heavily dependent on the Constant class, which itself is much more complex. What I'd like to do is create a thread for each iteration of the loop and do said computations separately, all the while controlling the number of threads (right now I'm using a simple semaphore).
Now obviously in the above code, the Constants class is shared between threads and thus cannot be updated by each thread without being updated for all of them.
So my question is : is there anyway to make my Constants class be able to have an instance for each thread, all the while being able to access its fields in a static manner ?
What you're describing is a thead-local: http://docs.oracle.com/javase/7/docs/api/java/lang/ThreadLocal.html . It's a good thing to use. However, as Affe points out, you can't use that with your code as is, because there's just one instance of the class and its static members (per classloader). If your Constants class is something that you can build several several copies of in parallel and then merge them together later, you should make Constants.constant an instance variable by removing "static". Then create a thread-local in myClass like so:
private ThreadLocal<Constants> constants = new ThreadLocal<Constants> {
#Override protected Integer initialValue() {
return nextId.getAndIncrement();
}
}
Once your threads are done updating their local object, they can stick them into a shared ArrayBlockingQueue. Your main thread can dequeue them all and merge them as you desire.
Another thing to note is that you may want to use a thread pool executor instead of one thread per iteration of the loop, if you will have a variable number of iterations, possibly many, but you don't want that many threads. (Thread creation is costly and many concurrent threads eat memory and OS scheduling resources.)
public class ObjectA {
private void foo() {
MutableObject mo = new MutableObject();
Runnable objectB = new ObjectB(mo);
new Thread(objectB).start();
}
}
public class ObjectB implements Runnable {
private MutableObject mo;
public ObjectB(MutableObject mo) {
this.mo = mo;
}
public void run() {
//read some field from mo
}
}
As you can see from the code sample above, I pass a mutable object to a class that implements Runnable and will use the mutable object in another thread. This is dangerous because ObjectA.foo() can still alter the mutable object's state after starting the new thread. What is the preferred way to ensure thread safety here? Should I make copy of the MutableObject when passing it to ObjectB? Should the mutable object ensure proper synchronization internally? I've come across this many times before, especially when trying to use SwingWorker in a number of GUI applications. I usually try to make sure that ONLY immutable object references are passed to a class that will use them in another thread, but sometimes this can be difficult.
This is a hard question, and the answer, unfortunately, is 'it depends'. You have three choices when it comes to thread-safety of your class:
Make it Immutable, then you don't have to worry. But this isn't what you're asking.
Make it thread-safe. That is, provide enough concurrency control internal to the class that client code doesn't have to worry about concurrent threads modifying the object.
Make it not-thread safe, and force client code to have some kind of external synchronization.
You're essentially asking whether you should use #2 or #3. You are worried about the case where another developer uses the class and doesn't know that it requires external synchronization. I like using the JCIP annotations #ThreadSafe #Immutable #NotThreadSafe as a way to document the concurrency intentions. This isn't bullet-proof, as developers still have to read the documentation, but if everyone on the team understands these annotations and consistently applies them, it does make things clearer.
For your example, if you want to make the class not thread-safe, you could use AtomicReference to make it clear and provide synchronization.
public class ObjectA {
private void foo() {
MutableObject mo = new MutableObject();
Runnable objectB = new ObjectB(new AtomicReference<>( mo ) );
new Thread(objectB).start();
}
}
public class ObjectB implements Runnable {
private AtomicReference<MutableObject> mo;
public ObjectB(AtomicReference<MutableObject> mo) {
this.mo = mo;
}
public void run() {
//read some field from mo
mo.get().readSomeField();
}
}
I think you are overcomplicating it. If it is as the example (a local variable of which no reference is kept) you should trust that nobody will try to write to it. If it is more complicated (A.foo() has more LOC) if possible, create it only to pass to the thread.
new Thread(new MutableObject()).start();
If not (due to initializations), declare it in a block so it gets out of scope immediately, even maybe in a separate private method.
{
MutableObject mo = new MutableObject();
Runnable objectB = new ObjectB(mo);
new Thread(objectB).start();
}
....
Copy the object. You won't have any weird visibility problems because you pass the copy to a new Thread. Thread.start always happens before the new thread enters its run method. If you change this code to pass the object to an existing thread, you need proper synchronization. I recommend a blocking queue from Java.util.concurrent.
Without knowing your exact situation, this question will be difficult to answer precisely. The answer totally depends on what the MutableObject represents, how many other threads may modify it simultaneously, and whether or not the threads that read the object care whether its state changes while they are reading it.
With respect to thread-safety, internally synchronizing all reads and writes to MutableObject is provably the "safest" thing to do, but it comes at the cost of performance. If contention is really high on reads and writes, then your program may suffer performance issues. You can get better performance by sacrificing some guarantees on mutual exclusion - whether those sacrifices are worth the performance increases totally depends on the specific problem you're trying to solve.
You can also play some games with how you go about "internally synchronizing" your MutableObject, if that's what you end up doing. If you haven't already, I'd recommend reading up on the differences between volatile and synchronized and understand how each can be used to ensure thread safety for different situations.
First, here is a motivating example:
public class Algorithm
{
public static void compute(Data data)
{
List<Task> tasks = new LinkedList<Task>();
Client client = new Client();
int totalTasks = 10;
for(int i = 0; i < totalTasks; i++)
tasks.add(new Task(data));
client.submit(tasks);
}
}
// AbstractTask implements Serializable
public class Task extends AbstractTask
{
private final Data data;
public Task(Data data)
{
this.data = data;
}
public void run()
{
// Do some stuff with the data.
}
}
So, I am doing some parallel programming and have a method which creates a large number of tasks. The tasks share the data that they will operate on, but I am having problems giving each task a reference to the data. The problem is, when the tasks are serialized, a copy of the data is made for each task. Now, in this task class, I could make a static reference to the data so that it is only stored once, but doing this doesn't really make much sense in the context of the task class. My idea is to store the object as a static in another external class and have the tasks request the object from the class. This can be done before the tasks are sent, likely, in the compute method in the example posted above. Do you think that this is appropriate? Can anyone offer any alternative solutions or tips regarding the idea suggested? Thanks!
Can you explain more about this serialization situation you're in? How do the Tasks report a result, and where does it go -- do they modify the Data? Do they produce some output? Do all tasks need access to all the Data? Are any of the Tasks written to the same ObjectOutputStream?
Abstractly, I guess I can see two classes of solutions.
If the Tasks don't all need access to all the Data, I would try to give each Task only the data that it needs.
If they do all need all of it, then instead of having the Task contain the Data itself, I would have it contain an ID of some kind that it can use to get the data. How to get just one copy of the Data transferred to each place a Task could run, and give the Task access to it, I'm not sure, without better understanding the overall situation. But I would suggest trying to manage the Data separately.
I'm not sure I fully understand the question, but it sounds to me as though Tasks are actually serialized for later execution.
If this is the case, an important question would be whether all of the Task objects are written to the same ObjectOutputStream. If so, the Data will only be serialized the first time it is encountered. Later "copies" will just reference the same object handle from the stream.
Perhaps one could take advantage of that to avoid static references to the data (which can cause a number of problems in OO design).
Edit: The answer below is not actually relevant, due to a misunderstanding about what was being asked. Leaving it here pending more details from the question's author.
This is precisely why the transient keyword was invented.
Declares that an instance field is not
part of the default serialized form of
an object. When an object is
serialized, only the values of its
non-transient instance fields are
included in the default serial
representation. When an object is
deserialized, transient fields are
initialized only to their default
value.
public class Task extends AbstractTask {
private final transient Data data;
public Task(Data data) {
this.data = data;
}
public void run() {
// Do some stuff with the data.
}
}
Have you considered making a singleton instead of making it static?
My idea is to store the object as a
static in another external class and
have the tasks request the object from
the class.
Forget about this idea. When the tasks are serialzed and sent over the network, that object will not be sent; static data is not (and cannot) be shared in any way between JVMs.
Basically, if your Tasks are serialized separately, the only way to share the data is to send it separately, or send it only in one task and somehow have the others acquire it on the receiving machine. This could happen via a static field that the one task that has the data sets and the others query, but of course that requires that one task to be run first. And it could lead to synchronization problems.
But actually, it sounds like you are using some sort of processing queue that assumes tasks to be self-contained. By trying to have them share data, you are going against that concept. How big is your data anyway? Is it really absolutely necessary to share the data?
This question already has answers here:
Java: starting a new thread in a constructor
(3 answers)
Closed 6 years ago.
is it legal for a thread to call this.start() inside its own constructor? and if so what potential issues can this cause? I understand that the object wont have fully initialized until the constructor has run to completion but aside from this are there any other issues?
For memory-safety reasons, you shouldn't expose a reference to an object or that object's fields to another thread from within its constructor. Assuming that your custom thread has instance variables, by starting it from within the constructor, you are guaranteed to violate the Java Memory Model guidelines. See Brian Goetz's Safe Construction Techniques for more info.
You will also see wierd issues if the Thread class is ever further subclassed. In that case, you'll end up with the thread running already once the super() exits, and anything the subclass might do in its constructor could be invalid.
#bill barksdale
If the thread is already running, calling start again gets you an IllegalThreadStateException, you don't get 2 threads.
I assume that you want to do this to make your code less verbose; instead of saying
Thread t = new CustomThread();
t.start();
activeThreads.add(t);
you can just say
activeThreads.add( new CustomThread() );
I also like having less verbosity, but I agree with the other respondents that you shouldn't do this. Specifically, it breaks the convention; anyone familiar with Java who reads the second example will assume that the thread has not been started. Worse yet, if they write their own threading code which interacts in some way with yours, then some threads will need to call start and others won't.
This may not seem compelling when you're working by yourself, but eventually you'll have to work with other people, and it's good to develop good coding habits so that you'll have an easy time working with others and code written with the standard conventions.
However, if you don't care about the conventions and hate the extra verbosity, then go ahead; this won't cause any problems, even if you try to call start multiple times by mistake.
By the way, if one wants lower verbosity and still keep the constructor with its "standard" semantics, one could create a factory method:
activeThreads.add( CustomThread.newStartedThread() );
It's legal, but not wise. The Thread part of the instance will be completely initialised, but your constructor may not. There is very little reason to extend Thread, and to pull tricks like this isn't going to help your code.
It is "legal", but I think the most important issue is this:
A class should do one thing and do it well.
If your class uses a thread internally, then the existence of that thread should not be visible in the public API. This allows improvement without affecting the public API. Solution: extend Runnable, not Thread.
If your class provides general functionality which, in this case, happens to run in a thread, then you don't want to limit yourself to always creating a thread. Same solution here: extend Runnable, not Thread.
For less verbosity I second the suggestion to use a factory method (e.g. Foo.createAndRunInThread()).
Legal ... yes (with caveats as mentioned elsewhere). Advisable ... no.
I's just a smell you can only too easily avoid. If you want your thread to auto-start, just do it like Heinz Kabutz.
public class ThreadCreationTest {
public static void main(String[] args) throws InterruptedException {
final AtomicInteger threads_created = new AtomicInteger(0);
while (true) {
final CountDownLatch latch = new CountDownLatch(1);
new Thread() {
{ start(); } // <--- Like this ... sweet and simple.
public void run() {
latch.countDown();
synchronized (this) {
System.out.println("threads created: " +
threads_created.incrementAndGet());
try {
wait();
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
}
}
}
};
latch.await();
}
}
}