Hello everyone and thanks for the time you dedicated in advance!
I have a problem I'm not sure how to approach in Java. Let's say I have a user interface that creates events that have to be "executed" in a specific time in the future, which may vary from a couple of minutes in the future to several days.
I have though of creating a class (let's say EventHandler) that implementa Runnable, and then a ConcurrentLinkedList that stores those instantiations ordered by the time they should be executed, from least in advance to most in advance. After that, a thread that peeks the queue, and If system time is greater than expected execution time, start the process.
Problem is, aside of concurrency problems associated with list, that the peek thread consumes CPU time. So I was wondering if there is a more elegant solution, considering there may be hundreds of events scheduled in a single second interval. Also, I'm using Hibernate with MongoDb to store stuff, if that affects at all.
Thank you!
EDIT: I'm open to all other solutions you may think of, as long as it solves the "queue events and execute them in the time they are set to"
A straightforward approach is to use ScheduledThreadPoolExecutor.
int corePoolSize = 1;// for sequential execution of tasks
// for parallel execution use Runtime.getRuntime().availableProcessors();
ScheduledThreadPoolExecutor executor = ScheduledThreadPoolExecutor(corePoolSize);
...
Runnable command1 = ...
executor.schedule(command1, 2, TimeUnit.MINUTES); // execute in couple of minutes
Runnable command2 = ...
executor.schedule(command2, 7, TimeUnit.DAYS); // execute in 7 days
Related
I was reading ScheduledThreadPoolExecutor JavaDoc and came across the following thing:
Delayed tasks execute no sooner than they are enabled, but without any
real-time guarantees about when, after they are enabled, they will
commence. Tasks scheduled for exactly the same execution time are
enabled in first-in-first-out (FIFO) order of submission.
So, if I write something like this:
ScheduledExecutorService ses = Executors.newScheduledThreadPool(4); //uses ScheduledThreadPoolExecutor internally
Callable<Integer> c;
//initialize c
ses.schedule(c, 10, TimeUnit.SECONDS);
there's no any guarantees that the execution of the callable will start in 10 seconds after the scheduling? As far as I got, the specification allows it to execute even in hour after scheduleing (without any real-time guarantees, as stated in the documentation).
How does it work in practice? Should I excepct some really long delay?
Your understanding is correct. The Executor is not claiming to be a real-time system with any sort of timing guarantees. The only thing it will guarantee is that it doesn't run tasks too early.
In practice, the timing of well-tuned Executors are very accurate. Typically they start within 10ms after the scheduled time from my experience. The only time you will see scheduling get pushed back very far is if your Executor is lacking the appropriate resources to run it's workload. So this is more of a tuning issue.
Realistically, if you give your Executor enough resources to work with, the timing will be quite accurate.
Some things that you don't want to do with an Executor is use the scheduling as part of a rate-based calculation. For example, if you schedule a task to run every 1 second and you use that to compute <somemetric> per second without factoring in what time the task is actually running at.
Another thing to be mindful of is the cost of context switching. If you schedule multiple tasks to run every 1ms, the Executor will not be able to keep up with running your task and context switching everyone 1ms.
So, I have a loop where I create thousands of threads which process my data.
I checked and storing a Thread slows down my app.
It's from my loop:
Record r = new Record(id, data, outPath, debug);
//r.start();
threads.add(r);
//id is 4 digits
//data is something like 500 chars long
It stop my for loop for a while (it takes a second or more for one run, too much!).
Only init > duration: 0:00:06.369
With adding thread to ArrayList > duration: 0:00:07.348
Questions:
what is the best way of storing Threads?
how to make Threads faster?
should I create Threads and run them with special executor, means for example 10 at once, then next 10 etc.? (if yes, then how?)
Consider that having a number of threads that is very high is not very useful.
At least you can execute at the same time a number of threads equals to the number of core of your cpu.
The best is to reuse existing threads. To do that you can use the Executor framework.
For example to create an Executor that handle internally at most 10 threads you can do the followig:
List<Record> records = ...;
ExecutorService executor = Executors.newFixedThreadPool(10);
for (Record r : records) {
executor.submit(r);
}
// At the end stop the executor
executor.shutdown();
With a code similar to this one you can submit also many thousands of commands (Runnable implementations) but no more than 10 threads will be created.
I'm guessing that it is not the .add method that is really slowing you down. My guess is that the hundreds of Threads running in parallel is what really is the problem. Of course a simple command like "add" will be queued in the pipeline and can take long to be executed, even if the execution itself is fast. Also it is possible that your data-structure has an add method that is in O(n).
Possible solutions for this:
* Find a real wait-free solution for this. E.g. prioritising threads.
* Add them all to your data-structure before executing them
While it is possible to work like this it is strongly discouraged to create more than some Threads for stuff like this. You should use the Thread Executor as David Lorenzo already pointed out.
I have a loop where I create thousands of threads...
That's a bad sign right there. Creating threads is expensive.
Presumeably your program creates thousands of threads because it has thousands of tasks to perform. The trick is, to de-couple the threads from the tasks. Create just a few threads, and re-use them.
That's what a thread pool does for you.
Learn about the java.util.concurrent.ThreadPoolExecutor class and related classes (e.g., Future). It implements a thread pool, and chances are very likely that it provides all of the features that you need.
If your needs are simple enough, you can use one of the static methdods in java.util.concurrent.Executors to create and configure a thread pool. (e.g., Executors.newFixedThreadPool(N) will create a new thread pool with exactly N threads.)
If your tasks are all compute bound, then there's no reason to have any more threads than the number of CPUs in the machine. If your tasks spend time waiting for something (e.g., waiting for commands from a network client), then the decision of how many threads to create becomes more complicated: It depends on how much of what resources those threads use. You may need to experiment to find the right number.
I apologize in advance if this is a basic question, but I'm new to the material. I've got a piece of software that is kicked off by user's submitting jobs through a website. Because the software is itself designed to capitalize on parallel processing, all I want to do is queue up these jobs so they can kick off one after the other. To do this, I've tried to capitalize on the Executor framework built into Java. The code I've developed is:
public JobManager()
{
mcpExecutor = Executors.newSingleThreadExecutor();
}
public Future<MatlabProcessResults> startProcess(inputs)
{
MyProcess myProcess = new MyProcess(inputs);
Future<MyProcessResults> future = mcpExecutor.submit(myProcess);
Long newKey = System.currentTimeMillis();
futures.putIfAbsent(newKey, future);
}
Where startProcess is run every time the "submit" button is pressed. Now, the description of the newSingleThreadExecutor reads:
Creates an Executor that uses a single worker thread operating off an unbounded queue. (Note however that if this single thread terminates due to a failure during execution prior to shutdown, a new one will take its place if needed to execute subsequent tasks.) Tasks are guaranteed to execute sequentially, and no more than one task will be active at any given time. Unlike the otherwise equivalent newFixedThreadPool(1) the returned executor is guaranteed not to be reconfigurable to use additional threads.
This led me to think that it would take multiple tasks, queue them, and only run one instance of the software at a time. As you might suspect, I'm writing because it doesn't do that. It is starting as many tasks as I submit in parallel (I know, the opposite problem of what most people probably want to do). Any help on this issue is much appreciated, and thank you in advance.
I am trying to Tune a thread which does the following:
A thread pool with just 1 thread [CorePoolSize =0, maxPoolSize = 1]
The Queue used is a ArrayBlockingQueue
Quesize = 20
BackGround:
The thread tries to read a request and perform an operation on it.
HOWEVER, eventually the requests have increased so much that the thread is always busy and consume 1 CPU which makes it a resource hog.
What I want to do it , instead sample the requests at intervals and process them . Other requests can be safely ignored.
What I would have to do is put a sleep in "operation" function so that for each task the thread sleeps for sometime and releases the CPU.
Quesiton:
However , I was wondering if there is a way to use a queue which basically itself sleeps for sometime before it reads the next element. This would be ideal since sleeping a task in the middle of execution and keeping the execution incomplete just doesn't sound the best to me.
Please let me know if you have any other suggestions as well for the tasks
Thanks.
Edit:
I have added a follow-up question here
corrected the maxpool size to be 1 [written in a haste] .. thanks tim for pointing it out.
No, you can't make the thread sleep while it's in the pool. If there's a task in the queue, it will be executed.
Pausing within a queued task is the only way to force the thread to be idle in spite of queued tasks. Now, the "sleep" doesn't have to be in the same task as the "work"—you could queue a separate rest task after each real task, which might make for a cleaner implementation. More importantly, if the work is a Callable that returns a result, separating into two tasks will allow you to obtain the result as soon as possible.
As a refinement, rather than sleeping for a fixed interval between every task, you could "throttle" execution to a specified rate. This would allow you to avoid waiting unnecessarily between tasks, yet avoid executing too many tasks within a specified time interval. You can read another answer of mine for a simple way to implement this with a DelayQueue.
You could subclass ThreadPool and override beforeExecute to sleep for some time:
#Overrides
protected void beforeExecute(Thread t,
Runnable r){
try{
Thread.sleep( millis); // will sleep the correct thread, see JavaDoc
}
catch (InterruptedException e){}
}
But see AngerClown's comment about artificially slowing down the queue probably not being a good idea.
This might not work for you, but you could try setting the executor's thread priority to low.
Essentially, create the ThreadPoolExecutor with a custom ThreadFactory. Have the ThreadFactory.newThread() method return Threads with a priority of Thread.MIN_PRIORITY. This will cause the executor service you use to only be scheduled if there is an available core to run it.
The implication: On a system that strictly uses time slicing, you will only be given a time slice to execute if there is no other Thread in the entire program with a greater priority asking to be scheduled. Depending on how busy your application really is, you might get scheduled every once in awhile, or you might not be scheduled at all.
The reason the thread is consuming 100% CPU is because it is given more work than it can process. Adding a delay between tasks is not going to fix this problem. It is just make things worse.
Instead you should look at WHY your tasks are consuming so much CPU e.g. with a profiler and change them so that consume less CPU until you find that your thread can keep up and it no longer consumes 100% cpu.
I have several Callables which query for some JMX Beans, so each one may time out. I want to poll for values lets say every second. The most naive approach would be to start each in a separate thread, but I want to minimize the number of threads. Which options do I have to do it in a better way?
My interpretation is that you have a bunch of Callable objects which need to be polled at some interval. The trouble if you use a thread pool is that the pool will become contaminated with the slowest members, and your faster ones will be starved.
It sounds like you have control over the scheduling, so you might consider an exponential backoff approach. That is, after Callable X has run (and perhaps timed out), you wait 2 seconds instead of 1 second before rescheduling it. If it still fails, go to 4s, then 8s, etc. If you use a ScheduledThreadPoolExecutor, it comes with a built-in way to do this, allowing you to schedule your executions after a set delay.
If you set a constant timeout, this strategy will reduce your pool's susceptibility to monopolization by the slow ones. It is very difficult to get rid of this problem completely. Using a separate thread per queried object is really the only way to make sure you don't get starvation, and that can be very resource-intensive, as you say.
Another strategy is to bucket your pool into a fast one and a slow one. If an object is timing out (say more than N times), you move it to the slow pool. This keeps your fast pool fast, and while the slow ones all get in each others' way, at least they don't clog up the fast pool. If they have good statistics for a while, you can promote them to the fast pool again.
As soon as you submit a Callable you receive a Future - a handle to the future result. You can decide to wait for its completion for a given amount of time:
Future<String> future = executorService.submit(callable);
try {
future.get(1, TimeUnit.SECONDS);
} catch ( TimeoutException e ) {
future.cancel(true);
} catch ...
Calling get with a timeout allows you to receive an exception if the task has not been completed. This does not distinguish between not started tasks and started but not completed. On the other hand cancel will take a boolean parameter mayStopIfRunning so you can choose to e.g. only cancel tasks not yet scheduled.
i agree with robbotic...implementing a 'cachedThreadPool' will solve your problem as it will restrict the number of threads to the optimum level at the same time has timeouts which will free your un-utilized resources