Accuracy of ScheduledExecutorService on normal OS / JVM - java

I use ScheduledExecutorService.scheduleAtFixedRate to run a daily task, like this:
executor.scheduleAtFixedRate(task, d, 24L * 3600 * 1000, TimeUnit.MILLISECONDS);
(d is the initial delay in milliseconds).
The executor is created by Executors.newSingleThreadScheduledExecutor() and runs multiple tasks, but they are all scheduled a few hours apart, and take at most a few minutes.
I know that ScheduledExecutorService makes no guarantees about accuracy, and I'd need a real-time OS and JVM to get that. That is not a requirement for my task, though.
I noticed that on a Windows 2003 Server, using JDK 1.7.0_03, the task slips by almost 10 seconds per day. That makes about 5 minutes per month, which is acceptable for my application. I'll probably have to implement re-scheduling anyway, because I want the task to run at a specific local time, and so I'll have to take care of DST myself. The service runs for long periods of time - half a year without restart is not that unusual.
Still, I think that an inaccuracy of 10 sec/day is rather high for a mostly idle system, and I wonder if I should be prepared for even worse behavior.
So my question is about your experiences with scheduleAtFixedRate. Are the 10 sec/day normal? Will I get better or worse accuracy in other environments (our customers also use Linux and Solaris servers)? Or are the 10 seconds an indication that something is amiss in our environment?

For a very long running task, it is not too surprising. Another problem you have is that it uses nanoTime() which is not synchronized with NTP or the like. This can result in drift with the wall clock.
One way to avoid this is to schedule repeatedly as you suggest. The repeating tasks actually reschedule themselves (which is why they cannot throws an exception, see below) You can have a one shot task which reschedules itself at the end, using the wall clock time and taking into account day list savings.
BTW: I would make sure you catch an exceptions or even Throwable thrown. If you don't your task will stop, possibly silently (unless you are looking at the Future object returned)
What I do is cheat a little. I have a task which wakes every 1 - 10 seconds and checks if it needs to run and if not returns. The overhead is usually trivial if you don't have thousands of tasks and it's much simpler to implement.

Related

Why does the Java Scheduler exhibit significant time drift on Windows?

I have Java service running on Windows 7 that runs once per day on a SingleThreadScheduledExecutor. I've never given it much though as it's non critical but recently looked at the numbers and saw that the service was drifting approximately 15 minutes per day which sounds way to much so dug it up.
Executors.newSingleThreadScheduledExecutor().scheduleAtFixedRate(() -> {
long drift = (System.currentTimeMillis() - lastTimeStamp - seconds * 1000);
lastTimeStamp = System.currentTimeMillis();
}, 0, 10, TimeUnit.SECONDS);
This method pretty consistently drifts +110ms per each 10 seconds. If I run it on a 1 second interval the drift averages +11ms.
Interestingly if I do the same on a Timer() values are pretty consistent with an average drift less than a full millisecond.
new Timer().schedule(new TimerTask() {
#Override
public void run() {
long drift = (System.currentTimeMillis() - lastTimeStamp - seconds * 1000);
lastTimeStamp = System.currentTimeMillis();
}
}, 0, seconds * 1000);
Linux: doesn't drift (nor with Executor, nor with Timer)
Windows: drifts like crazy with Executor, doesn't with Timer
Tested with Java8 and Java11.
Interestingly, if you assume a drift of 11ms per second you'll get 950400ms drift per day which amounts to 15.84 minutes per day. So it's pretty consistent.
The question is: why?
Why would this happen with a SingleThreadExecutor but not with a Timer.
Update1: following Slaw's comment I tried on multiple different hardware. What I found is that this issue doesn't manifest on any personal hardware. Only on the company one. On company hardware it also manifests on Win10, though an order of magnitude less.
As pointed out in the comments, the ScheduledThreadPoolExecutor bases its calculations on System.nanoTime(). For better or worse, the old Timer API however preceeded nanoTime(), and so uses System.currentTimeMillis() instead.
The difference here might seem subtle, but is more significant than one might expect. Contrary to popular belief, nanoTime() is not just a "more accurate version" of currentTimeMillis(). Millis is locked to system time, whereas nanos is not. Or as the docs put it:
This method can only be used to measure elapsed time and is not related to any other notion of system or wall-clock time. [...] The values returned by this method become meaningful only when the difference between two such values, obtained within the same instance of a Java virtual machine, is computed.
In your example, you're not following this guidance for the values to be "meaningful" - understandably, because the ScheduledThreadPoolExecutor only uses nanoTime() as an implementation detail. But the end result is the same, that being that you can't guarantee that it will stay synchronised to the system clock.
But why not? Seconds are seconds, right, so the two should stay in sync from a certain, known point?
Well, in theory, yes. But in practice, probably not.
Taking a look at the relevant native code on windows:
LARGE_INTEGER current_count;
QueryPerformanceCounter(&current_count);
double current = as_long(current_count);
double freq = performance_frequency;
jlong time = (jlong)((current/freq) * NANOSECS_PER_SEC);
return time;
We see nanos() uses the QueryPerformanceCounter API, which works by QueryPerformanceCounter getting the "ticks" of a frequency that's defined by QueryPerformanceFrequency. That frequency will stay identical, but the timer it's based off, and its synchronistaion algorithm that windows uses, varies by configuration, OS, and underlying hardware. Even ignoring the above, it's never going to be close to 100% accurate (it's based of a reasonably cheap crystal oscillator somewhere on the board, not a Caesium time standard!) so it's going to drift out with the system time as NTP keeps it in sync with reality.
In particular, this link gives some useful background, and reinforces the above pont:
When you need time stamps with a resolution of 1 microsecond or better and you don't need the time stamps to be synchronized to an external time reference, choose QueryPerformanceCounter.
(Bolding is mine.)
For your specific case of Windows 7 performing badly, note that in Windows 8+, the TSC synchronisation algorithm was improved, and QueryPerformanceCounter was always based on a TSC (as oppose to Windows 7, where it could be a TSC, HPET or the ACPI PM timer - the latter of which is especially rather inaccurate.) I suspect this is the most likely reason the situation improves tremendously on Windows 10.
That being said, the above factors still mean that you can't rely on the ScheduledThreadPoolExecutor to keep in time with "real" time - it will always drift. If that drift is an issue, then it's not a solution you can rely on in this context.
Side note: In Windows 8+, there is a GetSystemTimePreciseAsFileTime function which offers the high resolution of QueryPerformanceCounter combined with the accuracy of the system time. If Windows 7 was dropped as a supported platform, this could in theory be used to provide a System.getCurrentTimeNanos() method or similar, assuming other similar native functions exist for other supported platforms.
CronScheduler is a project of mine designed to be proof against time drift problem, and at the same time it avoids some of the problems with the old Timer class described in this post.
Example usage:
Duration syncPeriod = Duration.ofMinutes(1);
CronScheduler cron = CronScheduler.create(syncPeriod);
cron.scheduleAtFixedRateSkippingToLatest(0, 1, TimeUnit.MINUTES, runTimeMillis -> {
// Collect and send summary metrics to a remote monitoring system
});
Note: this project was actually inspired by this StackOverflow question.

Executors.newFixedThreadPool() - how expensive is this operation

I have a requirement where i need to process some tasks for current live shows.
This is a scheduled tasks and runs every minute.
At any given minute, there can be any number of live shows(though number cannot be that large, approx max 10). There are more than 20 functionalities needs to be done for all the live shows. or say 20 worker classes are there , all are doing there job.
Let say for first functionality, there are 5 shows, then after few minutes shows reduced to 2, then again after few minutes shows increase to 7.
Currently i am doing something like this,
int totalShowsCount = getCurrentShowsCount();
ExecutorService executor = Executors.newFixedThreadPool(showIds.size());
The above statements gets executed every minute.
Problem Statement
1.) How much expensive the above operation be..??. Creating fixedThreadPool at every given minute.
2.) What can i do to optimize my solution, should i use a fixed thread pool, say (10), and maybe 3 or 5 or 6 or any number of threads getting utilized at any given minute.
Can i create a fixed thread pool at worker level, and maintain it and
utilize that.
FYI, using Java8, if any better approach is available.
How much expensive the above operation be..??. Creating fixedThreadPool at every given minute.
Creating a thread pool is a relatively expensive operation which can take milli-seconds. You don't want to be doing this many times per second.
A second is an eternity for a computer, if you have a 36 core machine it can execute as much as 100 billion instructions in that amount of time. A minute is a very, very long time, and if you only do something once a minute you could even restart your JVM every minute and still get reasonable throughput.
What can i do to optimize my solution, should i use a fixed thread pool, say (10), and maybe 3 or 5 or 6 or any number of threads getting utilized at any given minute.
Possibly, it depends on what you are doing. Without most analysis you could say for sure. Note: If you are using parallelStream(), if not you should see if you can, you can use the built in ForkJoinPool.commonPool() and not need to create another pool. But again, this depends on what you are doing.

Delay by 1 millisecond not working?

I am trying to generate a number using:
System.currentTimeMillis()
I have to generate these numbers sometimes 5 times in a row, which happens so fast that they are the same (but I don't want them to be the same as we are using them as part of a unique field)
I thought I could put a delay in between when each one is generated, which would prevent them from being the same, using:
TimeUnit.MILLISECONDS.sleep(1);
But this still generates the same number. It only seems to generate a new one if I increase it to about 60 and above. I am trying to understand why this is? Thanks
If you want to generate 5 numbers starting at the current time that aren't the same - which as far as I can tell is your only requirement - you can use
long t = System.currentTimeMillis();
long ts[] = { t, t+1, t+2, t+3, t+4 };
Thread.sleep is absurd here. The user (or whoever) should not experience a delay for something that can be computed now.
The Java docs for Thread.sleep() says:
Causes the currently executing thread to sleep (temporarily cease execution) for the
specified number of milliseconds, subject to the precision and accuracy of system
timers and schedulers.
That bit about the "precision and accuracy of system timers and schedulers" is pretty important. I'd say that's why your not getting anything until you use Thread.sleep(60): those system timers just aren't very accurate.
Now a better question is why are you trying to "generate numbers..."
The JavaDoc for System.currentTimeMillis() says:
"Note that while the unit of time of the return value is a millisecond,
the granularity of the value depends on the underlying operating system
and may be larger. For example, many operating systems measure time in
units of tens of milliseconds."
From this, is it likely that your problem is not java - related, but rather your operating system is not keeping the time per - millisecond.
Thread.sleep guarantees the Thread will sleep, but not for how long. It will definitely sleep, at least for 1 millisecond but your os may not give priority to jvm, or jvm may not give priority to the thread after 1 millisecond.

Accurate spending time solution

Considering the following code snippets
class time implement Runnable{
long t=0L;
public void run(){
try{while(true){Thread.sleep(1000);t++;/*show the time*/}}catch(Throwable t){}
}
}
////
long long t=0L;
void* time(void* a){//pthread thread start
sleep(1);t++;//show the time
}
I read in some tutorial that in Java Thread.sleep(1000) is not exactly 1 second, and it might be more if the system is busy at the time, then OS switch to the thread late.
Questions:
Is this case true at all or no?
Is this scenario same for native (C/C++) codes?
What is the accurate way to count the seconds up in an application?
Others have answered about the accuracy of timing. Unfortunately, there is no GUARANTEED way to sleep for X amount of time, and wake up at exactly X.00000 seconds (or milliseconds, nanoseconds, etc).
For displaying time in seconds, you can just lower the time you are waiting to, say, half a second. Then you won't have the time jump two seconds from time to time, because half a second isn't going to be extended to more than a second (unless the OS & system you are running on is absolutely overloaded and nothing gets to run when it should - in which case you should fix that problem [get a faster processor, more memory, or whatever it takes], not fiddle with the timing of your application). This works well for "relatively long periods of time", such as one second or 1/10th of a second. For higher precision, it won't really work, since we're now entering the "scheduling jitter" zone.
If you want very accurate timing, then you will probably need to use a Real-Time OS, or at least an OS that has "real time extensions enabled", which will allow the OS to be more strict about time (at the cost of "ease of use" from the programmer, and possibly also the OS being less efficient in it's handling of processes, because it "switches more often than it needs to", compared to a more "lazy" timing approach).
Note also that the "may take longer", in an idle system is mainly the "rounding up of the timer" (if the system tick happens every 10ms or 1ms, the timer is set to 1000ms + whatever is left of the current timer tick, so may be 1009.999ms, or 1000.75ms, for example). The other overhead, that come from scheduling and general OS overheads should be in the microseconds range if not nanoseconds on any modern system - after all, an OS can do quite a lot of work in a microsecond - a modern x86 CPU will execute 3 cycles per clock, and the clock runs around 0.3ns. That's 10 instructions per nanosecond [of course, cache-misses and such will worsen this dramatically]. If the OS has more than a few thousand instructions to go from one process to another (less still for threads), then there's something quite wrong. A few thousand instructions # 10 instructions per nanonsecond = some hundreds of nanoseconds. Definitely less than a microsecond. Compare that to the 1ms or 10ms "jitter" of starting the timer just after the timer ticked off last time.
Naturally, if the CPU is busy running other tasks, this is different - then the time "left to run" on other processes will also influence the time taken to wake up a process.
Of course, in a heavily loaded memory system, the "just woken up" process may not be "ready to run", it could be swapped out to disk, for example. In which case, tens if not hundreds of milliseconds are needed to load it back from the disk.
To answer the two first questions: Yes it's true, and yes.
First there is the time between the timeout expires and the time when the OS notices it, then there the time for the OS to reschedule your process, and lastly there's the time from the process has been "woken up" until it is its turn to run. How long will all this take? There's no way of saying.
And as it's all done on the OS level, it doesn't really matter what language you program in.
As for a more accurate way? There is none. You can use more high-precision timers, but there is no way of avoiding the lag described above.
Yes, it´s true that it is not accurate.
It´s the same for simple sleep-functions in C/C++ and pretty much everything else.
Depending on your system, there could be better functions accessible,
but:
What is the accurate way
A really accurate way does not exist.
Unles you have some really expensive special computer with atomic clock included.
(and no usual OS too. And even then, we could argue what "accurate" means)
If busy waiting (high CPU load) is acceptable, look at nanoTime or native usleep, HighPerformanceCounter or whatever is applicable for your system
The sleep call tells the system to stop the thread execution for at least a time period specified as argument. The system will then resume thread execution when it has a chance (it actually depends on many factors, such as hardware, thread priorities, etc.). To more or less acurately measure the time you can store the time at the beginning of execution and then calculate the time delta each time it's needed.
The sleep function is not accurate, but if the intent is to display the total amount of seconds then you should store the current time at the beginning and then display the time difference every now and then.
This is true. Every sleep implementation in any language (C too) will fail to wait exactly 1 second. It has to deal with your OS scheduler, the sleep duration is juste a hint : the minimum sleep duration to be precise, but the actual difference depends on gigazillions of factors.
Trying to figure out the deviation is tricky if you want a very high resolution clock. In most cases, you'll have about 1~5 ms (roughly).
The thing is that the order of magnitude will be the same whatever the sleep duration. If you want something "accurate", you can divide your time application and wait for a longer period. For example, when you benchmark, you will prefer this type of implementation because the delta-time will increase, decreasing uncertainty :
// get t0
// process n times
// get t1
// compute average time : (t1-t0)/n

Quartz Performance

It seems there is a limit on the number of jobs that Quartz scheduler can run per second. In our scenario we are having about 20 jobs per second firing up for 24x7 and quartz worked well upto 10 jobs per second (with 100 quartz threads and 100 database connection pool size for a JDBC backed JobStore), however, when we increased it to 20 jobs per second, quartz became very very slow and its triggered jobs are very late compared to their actual scheduled time causing many many Misfires and eventually slowing down the overall performance of the system significantly. One interesting fact is that JobExecutionContext.getScheduledFireTime().getTime() for such delayed triggers comes to be 10-20 and even more minutes after their schedule time.
How many jobs the quartz scheduler can run per second without affecting the scheduled time of the jobs and what should be the optimum number of quartz threads for such load?
Or am I missing something here?
Details about what we want to achieve:
We have almost 10k items (categorized among 2 or more categories, in current case we have 2 categories) on which we need to some processing at given frequency e.g. 15,30,60... minutes and these items should be processed within that frequency with a given throttle per minute. e.g. lets say for 60 minutes frequency 5k items for each category should be processed with a throttle of 500 items per minute. So, ideally these items should be processed within first 10 (5000/500) minutes of each hour of the day with each minute having 500 items to be processed which are distributed evenly across the each second of the minute so we would have around 8-9 items per second for one category.
Now for to achieve this we have used Quartz as scheduler which triggers jobs for processing these items. However, we don't process each item with in the Job.execute method because it would take 5-50 seconds (averaging to 30 seconds) per item processing which involves webservice call. We rather push a message for each item processing on JMS queue and separate server machines process those jobs. I have noticed the time being taken by the Job.execute method not to be more than 30 milliseconds.
Server Details:
Solaris Sparc 64 Bit server with 8/16 cores/threads cpu for scheduler with 16GB RAM and we have two such machines in the scheduler cluster.
In a previous project, I was confronted with the same problem. In our case, Quartz performed good up a granularity of a second. Sub-second scheduling was a stretch and as you are observing, misfires happened often and the system became unreliable.
Solved this issue by creating 2 levels of scheduling: Quartz would schedule a job 'set' of n consecutive jobs. With a clustered Quartz, this means that a given server in the system would get this job 'set' to execute. The n tasks in the set are then taken in by a "micro-scheduler": basically a timing facility that used the native JDK API to further time the jobs up to the 10ms granularity.
To handle the individual jobs, we used a master-worker design, where the master was taking care of the scheduled delivery (throttling) of the jobs to a multi-threaded pool of workers.
If I had to do this again today, I'd rely on a ScheduledThreadPoolExecutor to manage the 'micro-scheduling'. For your case, it would look something like this:
ScheduledThreadPoolExecutor scheduledExecutor;
...
scheduledExecutor = new ScheduledThreadPoolExecutor(THREAD_POOL_SIZE);
...
// Evenly spread the execution of a set of tasks over a period of time
public void schedule(Set<Task> taskSet, long timePeriod, TimeUnit timeUnit) {
if (taskSet.isEmpty()) return; // or indicate some failure ...
long period = TimeUnit.MILLISECOND.convert(timePeriod, timeUnit);
long delay = period/taskSet.size();
long accumulativeDelay = 0;
for (Task task:taskSet) {
scheduledExecutor.schedule(task, accumulativeDelay, TimeUnit.MILLISECOND);
accumulativeDelay += delay;
}
}
This gives you a general idea on how use the JDK facility to micro-schedule tasks. (Disclaimer: You need to make this robust for a prod environment, like check failing tasks, manage retries (if supported), etc...).
With some testing + tuning, we found an optimal balance between the Quartz jobs and the amount of jobs in one scheduled set.
We experienced a 100X throughput improvement in this way. Network bandwidth was our actual limit.
First of all check How do I improve the performance of JDBC-JobStore? in Quartz documentation.
As you can probably guess there is in absolute value and definite metric. It all depends on your setup. However here are few hints:
20 jobs per second means around 100 database queries per second, including updates and locking. That's quite a lot!
Consider distributing your Quartz setup to cluster. However if database is a bottleneck, it won't help you. Maybe TerracottaJobStore will come to the rescue?
Having K cores in the system everything less than K will underutilize your system. If your jobs are CPU intensive, K is fine. If they are calling external web services, blocking or sleeping, consider much bigger values. However more than 100-200 threads will significantly slow down your system due to context switching.
Have you tried profiling? What is your machine doing most of the time? Can you post thread dump? I suspect poor database performance rather than CPU, but it depends on your use case.
You should limit your number of threads to somewhere between n and n*3 where n is the number of processors available. Spinning up more threads is going to cause a lot of context switching, since most of them will be blocked most of the time.
As far as jobs per second, it really depends on how long the jobs run and how often they're blocked for operations like network and disk io.
Also, something to consider is that perhaps quartz isn't the tool you need. If you're sending off 1-2 million jobs a day, you might want to look into a custom solution. What are you even doing with 2 million jobs a day?!
Another option, which is a really bad way to approach the problem, but sometimes works... what is the server it's running on? Is it an older server? It might be bumping up the ram or other specs on it will give you some extra 'umph'. Not the best solution, for sure, because that delays the problem, not addresses, but if you're in a crunch it might help.
In situations with high amount of jobs per second make sure your sql server uses row lock and not table lock. In mysql this is done by using InnoDB storage engine, and not the default MyISAM storage engine which only supplies table lock.
Fundamentally the approach of doing 1 item at a time is doomed and inefficient when you're dealing with such a large number of things to do within such a short time. You need to group things - the suggested approach of using a job set that then micro-schedules each individual job is a first step, but that still means doing a whole lot of almost nothing per job. Better would be to improve your webservice so you can tell it to process N items at a time, and then invoke it with sets of items to process. And even better is to avoid doing this sort of thing via webservices and process them all inside a database, as sets, which is what databases are good for. Any sort of job that processes one item at a time is fundamentally an unscalable design.

Categories

Resources