Sending work to a pool of workers that need initialization - java

I have a problem that seems close to what Executors and Thread pools do, however I can't seem to make it exactly fit. Basically I have workers which take a bit of time to initialize and that I'd like to pool, and once they are ready I use them to do work. I need to do this in a Thread :
worker = buildExpensiveWorker();
worker.doWork(work1);
worker.doWork(work2);
worker.doWork(work3);
...
While an Executor only allows me to do this :
doWork(work1);
doWork(work2);
doWork(work3);
...
Will I need to write my own Thread pool ? It feels like a shame to rewrite what is already well done. Or will I need to use ThreadLocal to hold my workers, and manage them from inside the Runnable's run() method ?

If you're talking about actually initializing the Thread objects prior to them being available for use, take a look at ThreadPoolExecutor.setThreadFactory:
http://docs.oracle.com/javase/6/docs/api/java/util/concurrent/ThreadPoolExecutor.html#setThreadFactory(java.util.concurrent.ThreadFactory)
You can provide your own implementation create a thread in any manner you want, including creating custom Thread subclasses and/or custom initialization.
I would say however, that if your initialization is expensive, you should probably try to configure the executor to keep the actual threads alive as long as possible (ie, set the keep alive time rather high, and try to spin up all the threads you need right away, and take the hit up front).
EDIT: Using a thread local (as mentioned in the comments):
public class ThreadData {
public static final ThreadLocal<String> data = new ThreadLocal<String>();
}
public class InitializingThread extends Thread {
public InitializingThread(Runnable r) {
super(r);
}
public void run() {
ThreadData.data.set("foo");
super.run();
}
}
public class InitializingThreadFactory implements ThreadFactory {
public Thread newThread(Runnable r) {
return new InitializingThread(r);
}
}
ThreadPoolExecutor executor = ...;
executor.setThreadFactory(new InitializingThreadFactory());
executor.execute(...);
And then in your Runnable:
public void run() {
String s = ThreadData.data.get();
}
Also, this approach (as opposed to using using Thread.currentThread() and casting) has the advantage of being able to actually be used with any Thread implementation (including the default), or without a thread (directly calling the .run() method after setting the value in the ThreadLocal). You could also easily change "ThreadLocal" to "InheritableThreadLocal", and set it once before submitting anything to the thread pool. All child threads would inherit the value from their parent thread (the one which created the pool).
It's important to note that the "run" method of a given thread will only ever executed once, even when in a thread pool, so this guarantees that your initialization routine happens on a per thread basis.

Related

Is there any problem when ExecutorService gives every time a new thread pool?

I have following code, which is executed each time for different number of threads:
class Worker<T> {
public void process() {
System.out.pritnln("Test");
}
}
class Processor {
private void doStuff(final Collection<Worker<V>> col) {
final int size = col.size();
if (size > 0) {
final ExecutorService threads = Executors.newFixedThreadPool(col.size());
for (Worker<V> w : col) {
threads.submit(() -> w.process());
}
threads.shutdown();
}
}
}
Which is printing every time in new identifier of polls:
(pool-66-thread-1) Test
(pool-66-thread-2) Test
(pool-67-thread-1) Test
(pool-68-thread-1) Test
(pool-68-thread-3) Test
(pool-68-thread-2) Test
I wonder if this is the common behavior, or at some point there is a memory leak and It will explode. Shouldn't it reuse previous pools?
How I see things, those previous pools are already shutdown thanks to the threads.shutdown() call
I wonder if this is the common behaviour, or at some point there is memory leak and will explode. Shouldn't it reuse previous pools?
Well, you are explicitly creating new pools all the time.
// this creates a new pool
final ExecutorService threads = Executors.newFixedThreadPool(col.size());
As for memory leaks, since you are shutting down your pools, that should be fine (but you should do it finally to be safe in case there are exceptions).
If you want to re-use the pool (which makes total sense), you should make threads an instance variable of your Processor (and make sure that Processor implements AutoCloseable and you shutdown the threads in the close method).
You are calling Executors.newFixedThreadPool() in your function, which creates a new thread pool.
This is not wrong as such, but it is uncommon, and goes against a lot of the motivation behind using thread pools. A more standard approach would be to create a single thread pool in your Processor class, and dispatch jobs to it, as such:
class Processor {
private final ExecutorService service = Executors.newFixedThreadExecutor(count);
private void doStuff() {
...
...
service.submit(() -> w.process());
}
}
Idea of having a thread pool is to restrict the number of threads you are creating and reusing the threads which already created, to avoid unnecessary congestion processing tons of threads at a given time and make it more efficient. Now the issue with your code is it is not restricting the number of threads nor reusing. Since you are only do some printing you may not see congestion but if you add more processing in your worker and keep invoke it in parallel then you will see a drastic latency in getting the processing done.

Good thread design: "Method in Thread" or "Thread in Method"

This is just a general question on actual thread design. I'm using Java on android specifically but general design would be the better focus of this question.
Its simple enough, which is better method in thread or thread in method.
Example,
Lets say we have 3 methods/functions/whatever.
public void readMail()
{
//Logic...
}
public void postQuestion()
{
//Logic...
}
public void answerQuestion()
{
//Logic...
}
Is it better to have
A: Thread within Method
public void readMail()
{
new Thread(new Runnable()
{
public void run()
{
//Logic
}
}).start();
}
And then call your method as you normally would in any OO situation. Say
Email.readMail();
B: Method within Thread
//note this could be inside a method or a class that extends runnable
new Thread(new Runnable()
{
public void run()
{
readMail();
postQuestion();
answerQuestion();
}
}).start();
Method within Thread
[+] If your methods do not need to ensure the property of concurrent execution, or they have deterministic runtime behavior (time and performance), this approach can be a high-level management for concurrency of the application; i.e. concurrency remains at the level of objects rather than methods.
[-] Since the concurrency remains at the level of threads/objects, the application may lose the notion of responsiveness. A user may be "posting a question" while another is "fetch an answer"; and both can be dealt with concurrently.
Thread with Method
[+] More fine-grained concurrency control: each method becomes a unit of execution at the OS level. That's why as #LouisWasserman mentioned, maybe, taking advantage of Executor framework would make more sense.
[-] Generally threads are resourceful and expensive; so this means that you will have performance issues when used in high-frequency/load application with numerous calls to one method. Specially, if there are inter-method data/logic dependencies. In this regard, synchronization also becomes a concerns and that's why using Actor models can help more.
I'd suggest reading more about Actor models and their available implementations.
The second option is more amenable to being rewritten to use Executors and the like, so I'd prefer that version.
I prefer:
C: One Thread One Object
public class Test {
public static class MailReader implements Runnable {
public void readMail() {
//Logic...
}
#Override
public void run() {
while (!Thread.currentThread().isInterrupted()) {
readMail();
}
}
}
public static class QuestionPoster implements Runnable {
public void postQuestion() {
//Logic...
}
#Override
public void run() {
while (!Thread.currentThread().isInterrupted()) {
postQuestion();
}
}
}
public static class QuestionAnswerer implements Runnable {
public void answerQuestion() {
//Logic...
}
#Override
public void run() {
while (!Thread.currentThread().isInterrupted()) {
answerQuestion();
}
}
}
public static void main(String[] args) throws FileNotFoundException {
new Thread(new QuestionAnswerer()).start();
new Thread(new QuestionPoster()).start();
new Thread(new MailReader()).start();
}
}
This allows the full gamut of possibilities without any additional grok effort. If you want more mail answered than questions posted, make more MailReaders.
If you see
for ( int i = 0; i < 10; i++ ) {
new Thread(new MailReader()).start();
}
you know exactly what is intended and you know that will work.
At first design (A), every method is a SEPARATE THREAD in fact, while at second design (B), you have ONLY ONE THREAD.
It deeply depends on you application logic & the operation which every method performs:
If you need to run your methods parallel, then A is the correct answer, but if you need execute all methods sequentially in a thread, then B would be your choice.
If you are building a utility for other programmers to use, note that the client programmer may not care about threads at all and may just want to write a single-threaded program. Unless there is a very good reason to do so, you shouldn't force them to drag threading issues into a program which would otherwise work fine single-threaded. Does this mean your library can't use threads internally? No! But to a caller, your methods should appear single-threaded (except that they return faster than they would if they were implemented without threads).
How can you do this? When someone calls into one of your methods, block the calling thread, and pass the task off to a pool of worker threads, who can perform it in parallel. After the worker threads finish the task, unblock the calling thread and let it return a value to the caller.
This way you can get the performance benefits of parallelism, without forcing callers to deal with threading issues.
Now, on the other hand, even if you decide that your library doesn't need to use threads internally, you should still make it thread-safe, because client programmers may want to use threads.
In other words, there is no reason why the decisions of "thread in method?" and "method in thread?" need to be coupled. You can use "thread in method" if there are performance benefits to doing so, but that shouldn't affect the caller. (They should just be able to call the method and get the needed return value back, without worrying about whether you are using threads internally).
If your module is thread-safe, then it won't be affected either by whether the caller is using threads or not. So if the client programmer wants to use threads, they can also use "method in thread". In some situations, you may have both "method in thread" and "thread in method" -- your module may be using a worker thread pool + task queue internally, and you may have multiple caller threads pushing tasks onto the queue and waiting for results.
Now, while I am talking like you are building a library, in reality you are probably just building code for your own use. But regardless of that, the same principles apply. If you want to use threads for performance, it is better to encapsulate the use of threads behind an interface, and make it so the rest of the program doesn't have to know or care whether module XYZ is using threads or not. At the same time, it is best if you make each module thread-safe, so callers can decide whether to use threads or not.

Java Dynamic number thread Creation and Mangement

I want to create dynamic number of thread which is depends on the database Number of database rows..
List list = session.createQuery("From Devices").list();
Number of thread is depends on the list.size().
I am Creating dynamic number of thread using For loop
new Thread(){public void run(){/* Task of each thread */}.start();
Is it right way to create dynamic number of thread ?? If i use shared variable do I need to define synchronized.. Any another idea how to manage thread where thread count become dynamic and depends on User.
Another question how can i define some private variable which is separated to each thread and not sheared to each other ..????
thanks
If you just need a team of threads to execute something on every row I would instead use a thread pool:
ExecutorService exec = Executors.newFixedThreadPool(list.size());
It's much easier to submit work and manage the pool than using an array of threads.
After this, synchronization depends on the actual computation. If you are concurrently modifying some shared state (list, counter) you will need to synchronize access across threads.
Finally, to define a thread task with private state:
class ThreadTask implements Runnable {
private int state; // example of private state
public ThreadTask(int state) {
this.state = state;
}
public void run() {
// task code
}
}
Is it right way to create dynamic number of thread ??
Yes, but not very elegant. Consider creating a separate class:
public class DeviceHandler implements Runnable {
//...
}
and then:
List<Device> list = session.createQuery("From Devices").list();
for(Device device: list) {
Thread thread = new Thread(new DeviceHandler(device));
thread.start();
}
Note that I assume each thread is responsible for a device so I am eagerly passing Device instance to each thread. You'll need it, trust me.
If i use shared variable do I need to define synchronized..
If multiple threads are accessing the same variable, you have to use some sort of synchronization: synchronized, Lock, volatile, Atomic*... This is way too broad for this question.
Any another idea how to manage thread where thread count become dynamic and depends on User.
Not sure what you mean. The thread count is dynamic in your current solution.
Another question how can i define some private variable which is separated to each thread and not sheared to each other ..????
Well, in Java you actually can't have variables local to thread (escept obscure ThreadLocal) because everything lives on global heap. However if you encapsulate some variable and let only a single thread to access it, you'll be safe:
public class DeviceHandler implements Runnable {
private int noGettersAndSettersForMe;
}

How to share the variable between two threads in java?

I have a loop that doing this:
WorkTask wt = new WorkTask();
wt.count = count;
Thread a = new Thread(wt);
a.start();
When the workTask is run, the count will wt++ ,
but the WorkTask doesn't seems change the count number, and between the thread, the variable can't share within two thread, what did I wrote wrong? Thanks.
Without seeing the code for WorkThread it's hard to pin down the problem, but most likely you are missing synchronization between the two threads.
Whenever you start a thread, there are no guarantees on whether the original thread or the newly created thread runs first, or how they are scheduled. The JVM/operating system could choose to run the original thread to completion and then start running the newly created thread, run the newly created thread to completion and then switch back to the original thread, or anything in between.
In order to control how the threads run, you have to synchronize them explicitly. There are several ways to control the interaction between threads - certainly too much to describe in a single answer. I would recommend the concurrency trail of the Java tutorials for a broad overview, but in your specific case the synchronization mechanisms to get you started will probably be Thread.join and the synchronized keyword (one specific use of this keyword is described in the Java tutorials).
Make the count variable static (it looks like each thread has its own version of the variable right now) and use a mutex to make it thread safe (ie use the synchronized instruction)
From your description I came up with the following to demonstrate what I perceived as your issue. This code, should output 42. But it outputs 41.
public class Test {
static class WorkTask implements Runnable {
static int count;
#Override
public void run() {
count++;
}
}
public static void main(String... args) throws Exception {
WorkTask wt = new WorkTask();
wt.count = 41;
Thread a = new Thread(wt);
a.start();
System.out.println(wt.count);
}
}
The problem is due to the print statement running before thread had a chance to start.
To cause the current thread ( the thread that is going to read variable count ) to wait until the thread finishes, add the following after starting thre thread.
a.join();
If you are wishing to get a result back from a thread, I would recommend you to use Callable
interface and an ExecutorSercive to submit it. e.g:
Future future = Executors.newCachedThreadPool().submit
(new Callable<Interger>()
{
int count = 1000;
#Override public Integer call() throws Exception
{
//here goes the operations you want to be executed concurrently.
return count + 1; //Or whatever the result is.
}
}
//Here goes the operations you need before the other thread is done.
System.out.println(future.get()); //Here you will retrieve the result from
//the other thread. if the result is not ready yet, the main thread
//(current thread) will wait for it to finish.
this way you don't have to deal with the synchronization problems and etc.
you can see further about this in Java documentations:
http://docs.oracle.com/javase/7/docs/api/java/util/concurrent/package-summary.html

Java synchronized methods question

I have class with 2 synchronized methods:
class Service {
public synchronized void calc1();
public synchronized void calc2();
}
Both takes considerable time to execute. The question is would execution of these methods blocks each other. I.e. can both methods be executed in parallel in different threads?
No they can't be executed in parallel on the same service - both methods share the same monitor (i.e. this), and so if thread A is executing calc1, thread B won't be able to obtain the monitor and so won't be able to run calc2. (Note that thread B could call either method on a different instance of Service though, as it will be trying to acquire a different, unheld monitor, since the this in question would be different.)
The simplest solution (assuming you want them to run independently) would be to do something like the following using explicit monitors:
class Service {
private final Object calc1Lock = new Object();
private final Object calc2Lock = new Object();
public void calc1() {
synchronized(calc1Lock) {
// ... method body
}
}
public void calc2() {
synchronized(calc2Lock) {
// ... method body
}
}
}
The "locks" in question don't need to have any special abilities other than being Objects, and thus having a specific monitor. If you have more complex requirements that might involve trying to lock and falling back immediately, or querying who holds a lock, you can use the actual Lock objects, but for the basic case these simple Object locks are fine.
Yes, you can execute them in two different threads without messing up your class internals but no they won't run in parallel - only one of them will be executed at each time.
No, they cannot be. In this case you might use a synchronized block instead of synchronizing the whole method. Don't forget to synchronize on different objects.

Categories

Resources