I had a question about mapreduce.job.speculative.slowtaskthreshold.
The docs say:
The number of standard deviations by which a task's ave progress-rates must be lower than the average of all running tasks' for the task to be considered too slow.
I'm curious what happens when a process is considered "too slow". Does it kill and restart it? Just kill? I'm curious because I think I've possibly encountered a race condition and would like to tune the parameters to minimize the issue.
Source: http://hadoop.apache.org/docs/current/hadoop-mapreduce-client/hadoop-mapreduce-client-core/mapred-default.xml
With speculative execution, when the framework decides that a task is "too slow", it will speculatively execute the same task on a different node. Once one of the two tasks finishes successfully, the other task is killed.
Related
How does things like scheduleAtFixedRate work? How does it work behind the scenes and is there a penalty to using it?
More specifically, I have a task that I want to run periodically, say every 12 hours. The period is not strict at all, so my first instinct was to check in every request (tomcat server) if it's been more than >12 hours since the task last executed and if so, execute it and reset the timer. The downside of this is that I have to do a small time check on every request, make sure the task is run only once (using a semaphore or something similar) and the task might not execute in a long time if there's no requests.
scheduleAtFixedRate makes it easier to schedule a recurring task, but since I don't know how it does it, I don't know what the performance impact is. Is there a thread continually checking if the task is due to run? etc.
edit:
In Timer.java, there's a mainLoop function which, in my understanding, is something like this (overly simplified):
while(true) {
currentTime = System.currentTimeMillis();
if(myTask.nextExecutionTime == currentTime) myTask.run();
}
Won't this loop try to run as fast as possible and use a ton of CPU (I know, obviously not, but why)? There's no Thread.sleep in there to slow things down.
You can read the code if you wish to work out how it works.
There is an overhead using ScheduledExecutorService in terms of CPU and memory, however on the scale of hours, minutes, second even milli-seconds, it probably not work worrying about. If you have a task running in the range of micro-seconds, I would consider something more light weight.
In short, the overhead is probably too small for you to notice. The benefit it gives you is ease of use, and it is likely to be worth it.
I'm new to Hazelcast. So have a question about best failure handling practices during parallel processing:
Mastering Hazelcast, section 6.6, p. 96:
Work-queue has no high availability: Each member will create one or
more local ThreadPoolExecutors with ordinary work-queues that do the
real work. When a task is submitted, it will be put on the work-queue
of that ThreadPoolExecutor and will not be backed up by Hazelcast. If
something would happen with that member, all unprocessed work will be
lost.
Task:
Suppose I've got 1 master node and 2 slaves. I launch time consuming task with
executor.submitToAllMembers (new TimeConsumingTask())
So each node is processing something. And while they all are processing something one of the slaves fails
Questions:
That's not possible to rerun the failed member work on another node, right?
Is there any other (preferably better) approach than rerun the whole job set across the whole cluster? (In case if TimeConsumingTask is Runnable)
Is there any other (preferably better) approach than rerun the whole job set across the whole cluster? (In case if TimeConsumingTask is Callable and I want to get a Future as a cluster computation result)
I'm assuming by 'failure handling' you're talking about the scenario where a node in the cluster goes down....
Question 1 Not automatically. You are right in assuming that Hazelcast's execution tasks are not fault tolerant. However, if you were able to handle the failure of a task, I can't see a reason why you couldn't resubmit the work to another member in the cluster.
Question 2 It's difficult to know what your TimeConsumingTask is actually doing - as with any distributed execution engine, it's generally better to compose the long running task as a series of smaller tasks. If you can't compose your task as smaller elements, then no - there's not really a better approach than resubmitting the whole job again
Question 3 The same thing applies to this question as question 2. Returning a Future from a task submission is not going to help you massively if a node fails. Futures provide you with the ability to wait (optionally for a specified timeout period) on the result and provide the possibility of cancelling the task.
Generally, for handling a node failing I would take a look to see whether an ExecutionCallback would help - in this case you get notified on a failure, which I am currently assuming that a node failure falls under this. When your callback is notified of the failure, you could resubmit the job.
You might also want to look at some other approaches that exist outside of the core Hazelcast API. Hazeltask is a project on GitHub that promises failover handling and task resubmission - so that might be worth a look?
What is both faster and "better practice", using a polling system or a event based timer?
I'm currently having a discussion with a more senior coworker regarding how to implement some mission critical logic. Here is the situation:
A message giving an execution time is received.
When that execution time is reached, some logic must be executed.
Now multiple messages can be received giving different execution times, and the logic must be executed each time.
I think that the best way to implement the logic would be to create a timer that would trigger the logic when the message at the time in the message, but my coworker believes that I would be better off polling a list of the messages to see if the execution time has been reached.
His argument is that the polling system is safer as it is less complicated and thus less likely to be screwed up by the programmer. My argument is that by implementing it my way, we reduce the reduce the computational load and thus are more likely execute the logic when we actually want it to execute. How should I implement it and why?
Requested Information
The only time my logic would ever be utilized would almost certainly be at a time of the highest load.
The requirements do not specify how reliable the connection will be but everyone I've talked to has stated that they have never heard of a message being dropped
The scheduling is based on an absolute system. So, the message will have a execution time specifying when an algorithm should be executed. Since there is time synchronization, I have been instructed to assume that the time will be uniform among all machines.
The algorithm that gets executed uses some inputs which initially are volatile but soon stabilize. By postponing the processing, I hope to use the most stable information available.
The java.util.Timer effectively does what your colleague suggests (truth be told, in the end, there really aren't that many ways to do this).
It maintains a collection of TimerTasks, and it waits for new activity on it, or until the time has come to execute the next task. It doesn't poll the collection, it "knows" that the next task will fire in N seconds, and waits until that happens or anything else (such as a TimerTask added or deleted). This is better overall than polling, since it spends most of its time sleeping.
So, in the end, you're both right -- you should use a Timer for this, because it basically does what your coworker wants to do.
Situation
I have web application
I have class which does complicated mathematics computation
Equations can take place from time to time depending on what request is
Sometimes many threads starts this computation simultaneously
When too many computations started, computer is become hanged (completely freeze = 99 CPU usage)
My goal is
My goal is to avoid hanging/freezing.
My guess is that it could be done by limiting number of simultaneous computations (probably to NUMBER_OF_CPU_CORES - 1)
Question is
What is the best way to reach this goal?
I know that there is java.util.concurrent.Semaphore, but maybe there is better approach?
Take a look at the Java ThreadPoolExecutor This should help with what you are trying to do.
Hope this helps...
Semaphore looks like it is exactly what you want.
You'll probably want to put some logic in so that you use Semaphore.tryAcquire and return an error to the user if it cannot acquire a permit. If you use the blocking acquire method then you'll still wind up with a locked-up server.
You should probably configure your application container to be limited to the number of request threads that you desire.
Barring that, the Semaphore is the perfect tool. Use the tryAcquire() method, and be sure to put a corresponding release in a finally block, like this:
if (permits.tryAcquire(7, TimeUnit.SECONDS))
try {
/* Do your computation. */
compute();
} finally {
permits.release();
}
else
/* Respond with "Too busy; try later," message. */
Reduce the priority of the threads calling your method. If the rest of the apps on your box are not CPU-intensive, this will hardly affect your computations but responses to keypresses etc. should still be good.
Actually, I'm surprised that the box would hang/freeze even with a CPU-overload from multiple ready threads, (unless their priority has been raised). Sluggish maybe...
Because I'm executing a time critical task every second, I compared several methods to find the best way to ensure that my task is really executed in fixed time steps. After calculating the standard derivation of the error for all methods it seems like using the method scheduledExecutorService.scheduleAtFixedRate() leads to the best results, but I don't have a clue why it is so.
Does anybody know how that method internally works? How does it for example in comparison to a simple sleep() ensure that the referenced task is really executed in fixed time steps?
A 'normal' Java VM cannot make any hard real-time guarantees about execution times (and as a consequence of that also about scheduling times). If you really need hard real time guarantees you should have a look at a real time VM like Java RTS. Of course you need a real time OS in that case too.
Regarding comparison to Thread.sleep(): the advantage of scheduledExecutorService.scheduleAtFixedRate() compared to a (naive) usage of Thread.sleep() is that it isn't affected by the execution time of the scheduled task. See ScheduledFutureTask.runPeriodic() on how this is implemented.
You can have a look of the OpenJDK7 implementation of ScheduledThreadPoolExecutor.java which is the only class implementing the ScheduledExecutorService interface.
However as far as I know there is guarantee in the ScheduledExecutorService about accuracy. So even if your measurments show this to be acurate that may not be the case if you switch to a different platform, vm or jdk .