Why does the Java Scheduler exhibit significant time drift on Windows?

Why does the Java Scheduler exhibit significant time drift on Windows? - java

I have Java service running on Windows 7 that runs once per day on a SingleThreadScheduledExecutor. I've never given it much though as it's non critical but recently looked at the numbers and saw that the service was drifting approximately 15 minutes per day which sounds way to much so dug it up.
Executors.newSingleThreadScheduledExecutor().scheduleAtFixedRate(() -> {
long drift = (System.currentTimeMillis() - lastTimeStamp - seconds * 1000);
lastTimeStamp = System.currentTimeMillis();
}, 0, 10, TimeUnit.SECONDS);
This method pretty consistently drifts +110ms per each 10 seconds. If I run it on a 1 second interval the drift averages +11ms.
Interestingly if I do the same on a Timer() values are pretty consistent with an average drift less than a full millisecond.
new Timer().schedule(new TimerTask() {
#Override
public void run() {
long drift = (System.currentTimeMillis() - lastTimeStamp - seconds * 1000);
lastTimeStamp = System.currentTimeMillis();
}
}, 0, seconds * 1000);
Linux: doesn't drift (nor with Executor, nor with Timer)
Windows: drifts like crazy with Executor, doesn't with Timer
Tested with Java8 and Java11.
Interestingly, if you assume a drift of 11ms per second you'll get 950400ms drift per day which amounts to 15.84 minutes per day. So it's pretty consistent.
The question is: why?
Why would this happen with a SingleThreadExecutor but not with a Timer.
Update1: following Slaw's comment I tried on multiple different hardware. What I found is that this issue doesn't manifest on any personal hardware. Only on the company one. On company hardware it also manifests on Win10, though an order of magnitude less.

As pointed out in the comments, the ScheduledThreadPoolExecutor bases its calculations on System.nanoTime(). For better or worse, the old Timer API however preceeded nanoTime(), and so uses System.currentTimeMillis() instead.
The difference here might seem subtle, but is more significant than one might expect. Contrary to popular belief, nanoTime() is not just a "more accurate version" of currentTimeMillis(). Millis is locked to system time, whereas nanos is not. Or as the docs put it:
This method can only be used to measure elapsed time and is not related to any other notion of system or wall-clock time. [...] The values returned by this method become meaningful only when the difference between two such values, obtained within the same instance of a Java virtual machine, is computed.
In your example, you're not following this guidance for the values to be "meaningful" - understandably, because the ScheduledThreadPoolExecutor only uses nanoTime() as an implementation detail. But the end result is the same, that being that you can't guarantee that it will stay synchronised to the system clock.
But why not? Seconds are seconds, right, so the two should stay in sync from a certain, known point?
Well, in theory, yes. But in practice, probably not.
Taking a look at the relevant native code on windows:
LARGE_INTEGER current_count;
QueryPerformanceCounter(&current_count);
double current = as_long(current_count);
double freq = performance_frequency;
jlong time = (jlong)((current/freq) * NANOSECS_PER_SEC);
return time;
We see nanos() uses the QueryPerformanceCounter API, which works by QueryPerformanceCounter getting the "ticks" of a frequency that's defined by QueryPerformanceFrequency. That frequency will stay identical, but the timer it's based off, and its synchronistaion algorithm that windows uses, varies by configuration, OS, and underlying hardware. Even ignoring the above, it's never going to be close to 100% accurate (it's based of a reasonably cheap crystal oscillator somewhere on the board, not a Caesium time standard!) so it's going to drift out with the system time as NTP keeps it in sync with reality.
In particular, this link gives some useful background, and reinforces the above pont:
When you need time stamps with a resolution of 1 microsecond or better and you don't need the time stamps to be synchronized to an external time reference, choose QueryPerformanceCounter.
(Bolding is mine.)
For your specific case of Windows 7 performing badly, note that in Windows 8+, the TSC synchronisation algorithm was improved, and QueryPerformanceCounter was always based on a TSC (as oppose to Windows 7, where it could be a TSC, HPET or the ACPI PM timer - the latter of which is especially rather inaccurate.) I suspect this is the most likely reason the situation improves tremendously on Windows 10.
That being said, the above factors still mean that you can't rely on the ScheduledThreadPoolExecutor to keep in time with "real" time - it will always drift. If that drift is an issue, then it's not a solution you can rely on in this context.
Side note: In Windows 8+, there is a GetSystemTimePreciseAsFileTime function which offers the high resolution of QueryPerformanceCounter combined with the accuracy of the system time. If Windows 7 was dropped as a supported platform, this could in theory be used to provide a System.getCurrentTimeNanos() method or similar, assuming other similar native functions exist for other supported platforms.

CronScheduler is a project of mine designed to be proof against time drift problem, and at the same time it avoids some of the problems with the old Timer class described in this post.
Example usage:
Duration syncPeriod = Duration.ofMinutes(1);
CronScheduler cron = CronScheduler.create(syncPeriod);
cron.scheduleAtFixedRateSkippingToLatest(0, 1, TimeUnit.MINUTES, runTimeMillis -> {
// Collect and send summary metrics to a remote monitoring system
});
Note: this project was actually inspired by this StackOverflow question.

Related

currentTimeMillis() and waiting by spinning on nanoTime() [duplicate]

Accuracy Vs. Precision
What I would like to know is whether I should use System.currentTimeMillis() or System.nanoTime() when updating my object's positions in my game? Their change in movement is directly proportional to the elapsed time since the last call and I want to be as precise as possible.
I've read that there are some serious time-resolution issues between different operating systems (namely that Mac / Linux have an almost 1 ms resolution while Windows has a 50ms resolution??). I'm primarly running my apps on windows and 50ms resolution seems pretty inaccurate.
Are there better options than the two I listed?
Any suggestions / comments?

If you're just looking for extremely precise measurements of elapsed time, use System.nanoTime(). System.currentTimeMillis() will give you the most accurate possible elapsed time in milliseconds since the epoch, but System.nanoTime() gives you a nanosecond-precise time, relative to some arbitrary point.
From the Java Documentation:
public static long nanoTime()
Returns the current value of the most precise available system timer, in nanoseconds.
This method can only be used to
measure elapsed time and is not
related to any other notion of system
or wall-clock time. The value returned
represents nanoseconds since some
fixed but arbitrary origin time (perhaps in
the future, so values may be
negative). This method provides
nanosecond precision, but not
necessarily nanosecond accuracy. No
guarantees are made about how
frequently values change. Differences
in successive calls that span greater
than approximately 292 years (263
nanoseconds) will not accurately
compute elapsed time due to numerical
overflow.
For example, to measure how long some code takes to execute:
long startTime = System.nanoTime();
// ... the code being measured ...
long estimatedTime = System.nanoTime() - startTime;
See also: JavaDoc System.nanoTime() and JavaDoc System.currentTimeMillis() for more info.

Since no one else has mentioned this…
It is not safe to compare the results of System.nanoTime() calls between different JVMs, each JVM may have an independent 'origin' time.
System.currentTimeMillis() will return the (approximate) same value between JVMs, because it is tied to the system wall clock time.
If you want to compute the amount of time that has elapsed between two events, like a stopwatch, use nanoTime(); changes in the system wall-clock make currentTimeMillis() incorrect for this use case.

Update by Arkadiy: I've observed more correct behavior of System.currentTimeMillis() on Windows 7 in Oracle Java 8. The time was returned with 1 millisecond precision. The source code in OpenJDK has not changed, so I do not know what causes the better behavior.
David Holmes of Sun posted a blog article a couple years ago that has a very detailed look at the Java timing APIs (in particular System.currentTimeMillis() and System.nanoTime()), when you would want to use which, and how they work internally.
Inside the Hotspot VM: Clocks, Timers and Scheduling Events - Part I - Windows
One very interesting aspect of the timer used by Java on Windows for APIs that have a timed wait parameter is that the resolution of the timer can change depending on what other API calls may have been made - system wide (not just in the particular process). He shows an example where using Thread.sleep() will cause this resolution change.

As others have said, currentTimeMillis is clock time, which changes due to daylight saving time (not: daylight saving & time zone are unrelated to currentTimeMillis, the rest is true), users changing the time settings, leap seconds, and internet time sync. If your app depends on monotonically increasing elapsed time values, you might prefer nanoTime instead.
You might think that the players won't be fiddling with the time settings during game play, and maybe you'd be right. But don't underestimate the disruption due to internet time sync, or perhaps remote desktop users. The nanoTime API is immune to this kind of disruption.
If you want to use clock time, but avoid discontinuities due to internet time sync, you might consider an NTP client such as Meinberg, which "tunes" the clock rate to zero it in, instead of just resetting the clock periodically.
I speak from personal experience. In a weather application that I developed, I was getting randomly occurring wind speed spikes. It took a while for me to realize that my timebase was being disrupted by the behavior of clock time on a typical PC. All my problems disappeared when I started using nanoTime. Consistency (monotonicity) was more important to my application than raw precision or absolute accuracy.

System.nanoTime() isn't supported in older JVMs. If that is a concern, stick with currentTimeMillis
Regarding accuracy, you are almost correct. On SOME Windows machines, currentTimeMillis() has a resolution of about 10ms (not 50ms). I'm not sure why, but some Windows machines are just as accurate as Linux machines.
I have used GAGETimer in the past with moderate success.

Yes, if such precision is required use System.nanoTime(), but be aware that you are then requiring a Java 5+ JVM.
On my XP systems, I see system time reported to at least 100 microseconds 278 nanoseconds using the following code:
private void test() {
System.out.println("currentTimeMillis: "+System.currentTimeMillis());
System.out.println("nanoTime : "+System.nanoTime());
System.out.println();
testNano(false); // to sync with currentTimeMillis() timer tick
for(int xa=0; xa<10; xa++) {
testNano(true);
}
}
private void testNano(boolean shw) {
long strMS=System.currentTimeMillis();
long strNS=System.nanoTime();
long curMS;
while((curMS=System.currentTimeMillis()) == strMS) {
if(shw) { System.out.println("Nano: "+(System.nanoTime()-strNS)); }
}
if(shw) { System.out.println("Nano: "+(System.nanoTime()-strNS)+", Milli: "+(curMS-strMS)); }
}

For game graphics & smooth position updates, use System.nanoTime() rather than System.currentTimeMillis(). I switched from currentTimeMillis() to nanoTime() in a game and got a major visual improvement in smoothness of motion.
While one millisecond may seem as though it should already be precise, visually it is not. The factors nanoTime() can improve include:
accurate pixel positioning below wall-clock resolution
ability to anti-alias between pixels, if you want
Windows wall-clock inaccuracy
clock jitter (inconsistency of when wall-clock actually ticks forward)
As other answers suggest, nanoTime does have a performance cost if called repeatedly -- it would be best to call it just once per frame, and use the same value to calculate the entire frame.

System.currentTimeMillis() is not safe for elapsed time because this method is sensitive to the system realtime clock changes of the system.
You should use System.nanoTime.
Please refer to Java System help:
About nanoTime method:
.. This method provides nanosecond precision, but not necessarily
nanosecond resolution (that is, how frequently the value changes) - no
guarantees are made except that the resolution is at least as good as
that of currentTimeMillis()..
If you use System.currentTimeMillis() your elapsed time can be negative (Back <-- to the future)

I've had good experience with nanotime. It provides wall-clock time as two longs (seconds since the epoch and nanoseconds within that second), using a JNI library. It's available with the JNI part precompiled for both Windows and Linux.

one thing here is the inconsistency of the nanoTime method.it does not give very consistent values for the same input.currentTimeMillis does much better in terms of performance and consistency,and also ,though not as precise as nanoTime,has a lower margin of error,and therefore more accuracy in its value. i would therefore suggest that you use currentTimeMillis

Accurate spending time solution

Considering the following code snippets
class time implement Runnable{
long t=0L;
public void run(){
try{while(true){Thread.sleep(1000);t++;/*show the time*/}}catch(Throwable t){}
}
}
////
long long t=0L;
void* time(void* a){//pthread thread start
sleep(1);t++;//show the time
}
I read in some tutorial that in Java Thread.sleep(1000) is not exactly 1 second, and it might be more if the system is busy at the time, then OS switch to the thread late.
Questions:
Is this case true at all or no?
Is this scenario same for native (C/C++) codes?
What is the accurate way to count the seconds up in an application?

Others have answered about the accuracy of timing. Unfortunately, there is no GUARANTEED way to sleep for X amount of time, and wake up at exactly X.00000 seconds (or milliseconds, nanoseconds, etc).
For displaying time in seconds, you can just lower the time you are waiting to, say, half a second. Then you won't have the time jump two seconds from time to time, because half a second isn't going to be extended to more than a second (unless the OS & system you are running on is absolutely overloaded and nothing gets to run when it should - in which case you should fix that problem [get a faster processor, more memory, or whatever it takes], not fiddle with the timing of your application). This works well for "relatively long periods of time", such as one second or 1/10th of a second. For higher precision, it won't really work, since we're now entering the "scheduling jitter" zone.
If you want very accurate timing, then you will probably need to use a Real-Time OS, or at least an OS that has "real time extensions enabled", which will allow the OS to be more strict about time (at the cost of "ease of use" from the programmer, and possibly also the OS being less efficient in it's handling of processes, because it "switches more often than it needs to", compared to a more "lazy" timing approach).
Note also that the "may take longer", in an idle system is mainly the "rounding up of the timer" (if the system tick happens every 10ms or 1ms, the timer is set to 1000ms + whatever is left of the current timer tick, so may be 1009.999ms, or 1000.75ms, for example). The other overhead, that come from scheduling and general OS overheads should be in the microseconds range if not nanoseconds on any modern system - after all, an OS can do quite a lot of work in a microsecond - a modern x86 CPU will execute 3 cycles per clock, and the clock runs around 0.3ns. That's 10 instructions per nanosecond [of course, cache-misses and such will worsen this dramatically]. If the OS has more than a few thousand instructions to go from one process to another (less still for threads), then there's something quite wrong. A few thousand instructions # 10 instructions per nanonsecond = some hundreds of nanoseconds. Definitely less than a microsecond. Compare that to the 1ms or 10ms "jitter" of starting the timer just after the timer ticked off last time.
Naturally, if the CPU is busy running other tasks, this is different - then the time "left to run" on other processes will also influence the time taken to wake up a process.
Of course, in a heavily loaded memory system, the "just woken up" process may not be "ready to run", it could be swapped out to disk, for example. In which case, tens if not hundreds of milliseconds are needed to load it back from the disk.

To answer the two first questions: Yes it's true, and yes.
First there is the time between the timeout expires and the time when the OS notices it, then there the time for the OS to reschedule your process, and lastly there's the time from the process has been "woken up" until it is its turn to run. How long will all this take? There's no way of saying.
And as it's all done on the OS level, it doesn't really matter what language you program in.
As for a more accurate way? There is none. You can use more high-precision timers, but there is no way of avoiding the lag described above.

Yes, it´s true that it is not accurate.
It´s the same for simple sleep-functions in C/C++ and pretty much everything else.
Depending on your system, there could be better functions accessible,
but:
What is the accurate way
A really accurate way does not exist.
Unles you have some really expensive special computer with atomic clock included.
(and no usual OS too. And even then, we could argue what "accurate" means)
If busy waiting (high CPU load) is acceptable, look at nanoTime or native usleep, HighPerformanceCounter or whatever is applicable for your system

The sleep call tells the system to stop the thread execution for at least a time period specified as argument. The system will then resume thread execution when it has a chance (it actually depends on many factors, such as hardware, thread priorities, etc.). To more or less acurately measure the time you can store the time at the beginning of execution and then calculate the time delta each time it's needed.

The sleep function is not accurate, but if the intent is to display the total amount of seconds then you should store the current time at the beginning and then display the time difference every now and then.

This is true. Every sleep implementation in any language (C too) will fail to wait exactly 1 second. It has to deal with your OS scheduler, the sleep duration is juste a hint : the minimum sleep duration to be precise, but the actual difference depends on gigazillions of factors.
Trying to figure out the deviation is tricky if you want a very high resolution clock. In most cases, you'll have about 1~5 ms (roughly).
The thing is that the order of magnitude will be the same whatever the sleep duration. If you want something "accurate", you can divide your time application and wait for a longer period. For example, when you benchmark, you will prefer this type of implementation because the delta-time will increase, decreasing uncertainty :
// get t0
// process n times
// get t1
// compute average time : (t1-t0)/n

Why is System.nanoTime() way slower (in performance) than System.currentTimeMillis()?

Today I did a little quick Benchmark to test speed performance of System.nanoTime() and System.currentTimeMillis():
long startTime = System.nanoTime();
for(int i = 0; i < 1000000; i++) {
long test = System.nanoTime();
}
long endTime = System.nanoTime();
System.out.println("Total time: "+(endTime-startTime));
This are the results:
System.currentTimeMillis(): average of 12.7836022 / function call
System.nanoTime(): average of 34.6395674 / function call
Why are the differences in running speed so big?
Benchmark system:
Java 1.7.0_25
Windows 8 64-bit
CPU: AMD FX-6100

From this Oracle blog:
System.currentTimeMillis() is implemented using the
GetSystemTimeAsFileTime method, which essentially just reads the low
resolution time-of-day value that Windows maintains. Reading this
global variable is naturally very quick - around 6 cycles according to
reported information.
System.nanoTime() is implemented using the
QueryPerformanceCounter/ QueryPerformanceFrequency API (if available,
else it returns currentTimeMillis*10^6).
QueryPerformanceCounter(QPC) is implemented in different ways
depending on the hardware it's running on. Typically it will use
either the programmable-interval-timer (PIT), or the ACPI power
management timer (PMT), or the CPU-level timestamp-counter (TSC).
Accessing the PIT/PMT requires execution of slow I/O port instructions
and as a result the execution time for QPC is in the order of
microseconds. In contrast reading the TSC is on the order of 100 clock
cycles (to read the TSC from the chip and convert it to a time value
based on the operating frequency).
Perhaps this answer the question. The two methods use different number of clock cycles, thus resulting in slow speed of the later one.
Further in that blog in the conclusion section:
If you are interested in measuring/calculating elapsed time, then always use System.nanoTime(). On most systems it will give a resolution on the order of microseconds. Be aware though, this call can also take microseconds to execute on some platforms.

Most OS's (you didn't mention which one you are using) have an in memory counter/clock which provides millisecond accuracy (or close to that). For nanosecond accuracy most have to read a hardware counter. Communicating with hardware is slower then reading some value already in memory.

It may only be the case on Windows. See this answer to a similar question.
Basically, System.currentTimeMillis() just reads a global variable maintained by Windows (which is why it has low granularity), whereas System.nanoTime() actually has to do IO operations.

You are measuring that on Windows, aren't you. I went through this exercise in 2008. nanoTime IS slower on Windows than currentTimeMillis. As I recall, on Linux, nanotime is faster than currentTimeMillis and is certainly faster than it is on Windows.
The important thing to note is if you are trying to measure the aggregate of multiple sub-millisecond operations, you must use nanotime as if the operation finished in less than 1/1000th of a second your code, comparing currentTimeMillis will show the operation as instantaneous so 1,000 of these will still be instantaneous. What you might want to do is use nanotime then round to the nearest millisecond, so if an operation took 8000 nanoseconds it will be counted as 1 millisecond, not 0.

What you might want to do is use nanotime then round to the nearest millisecond, so if an operation took 8000 nanoseconds it will be counted as 1 millisecond, not 0.
Arithmetic note:
8000 nanoseconds is 8 microseconds is 0.008 milliseconds. Rounding will take that to 0 milliseconds.

Why do System.nanoTime() and System.currentTimeMillis() drift apart so rapidly?

For diagnostic purposes, I want to be able to detect changes in the system time-of-day clock in a long-running server application. Since System.currentTimeMillis() is based on wall clock time and System.nanoTime() is based on a system timer that is independent(*) of wall clock time, I thought I could use changes in the difference between these values to detect system time changes.
I wrote up a quick test app to see how stable the difference between these values is, and to my surprise the values diverge immediately for me at the level of several milliseconds per second. A few times I saw much faster divergences. This is on a Win7 64-bit desktop with Java 6. I haven't tried this test program below under Linux (or Solaris or MacOS) to see how it performs. For some runs of this app, the divergence is positive, for some runs it is negative. It appears to depend on what else the desktop is doing, but it's hard to say.
public class TimeTest {
private static final int ONE_MILLION = 1000000;
private static final int HALF_MILLION = 499999;
public static void main(String[] args) {
long start = System.nanoTime();
long base = System.currentTimeMillis() - (start / ONE_MILLION);
while (true) {
try {
Thread.sleep(1000);
} catch (InterruptedException e) {
// Don't care if we're interrupted
}
long now = System.nanoTime();
long drift = System.currentTimeMillis() - (now / ONE_MILLION) - base;
long interval = (now - start + HALF_MILLION) / ONE_MILLION;
System.out.println("Clock drift " + drift + " ms after " + interval
+ " ms = " + (drift * 1000 / interval) + " ms/s");
}
}
}
Inaccuracies with the Thread.sleep() time, as well as interruptions, should be entirely irrelevant to timer drift.
Both of these Java "System" calls are intended for use as a measurement -- one to measure differences in wall clock time and the other to measure absolute intervals, so when the real-time-clock is not being changed, these values should change at very close to the same speed, right? Is this a bug or a weakness or a failure in Java? Is there something in the OS or hardware that prevents Java from being more accurate?
I fully expect some drift and jitter(**) between these independent measurements, but I expected well less than a minute per day of drift. 1 msec per second of drift, if monotonic, is almost 90 seconds! My worst-case observed drift was perhaps ten times that. Every time I run this program, I see drift on the very first measurement. So far, I have not run the program for more than about 30 minutes.
I expect to see some small randomness in the values printed, due to jitter, but in almost all runs of the program I see steady increase of the difference, often as much as 3 msec per second of increase and a couple times much more than that.
Does any version of Windows have a mechanism similar to Linux that adjusts the system clock speed to slowly bring the time-of-day clock into sync with the external clock source? Would such a thing influence both timers, or only the wall-clock timer?
(*) I understand that on some architectures, System.nanoTime() will of necessity use the same mechanism as System.currentTimeMillis(). I also believe it's fair to assume that any modern Windows server is not such a hardware architecture. Is this a bad assumption?
(**) Of course, System.currentTimeMillis() will usually have a much larger jitter than System.nanoTime() since its granularity is not 1 msec on most systems.

You might find this Sun/Oracle blog post about JVM timers to be of interest.
Here are a couple of the paragraphs from that article about JVM timers under Windows:
System.currentTimeMillis() is implemented using the GetSystemTimeAsFileTime method, which essentially just reads the low resolution time-of-day value that Windows maintains. Reading this global variable is naturally very quick - around 6 cycles according to reported information. This time-of-day value is updated at a constant rate regardless of how the timer interrupt has been programmed - depending on the platform this will either be 10ms or 15ms (this value seems tied to the default interrupt period).
System.nanoTime() is implemented using the QueryPerformanceCounter / QueryPerformanceFrequency API (if available, else it returns currentTimeMillis*10^6). QueryPerformanceCounter(QPC) is implemented in different ways depending on the hardware it's running on. Typically it will use either the programmable-interval-timer (PIT), or the ACPI power management timer (PMT), or the CPU-level timestamp-counter (TSC). Accessing the PIT/PMT requires execution of slow I/O port instructions and as a result the execution time for QPC is in the order of microseconds. In contrast reading the TSC is on the order of 100 clock cycles (to read the TSC from the chip and convert it to a time value based on the operating frequency). You can tell if your system uses the ACPI PMT by checking if QueryPerformanceFrequency returns the signature value of 3,579,545 (ie 3.57MHz). If you see a value around 1.19Mhz then your system is using the old 8245 PIT chip. Otherwise you should see a value approximately that of your CPU frequency (modulo any speed throttling or power-management that might be in effect.)

I am not sure how much this will actually help. But this is an area of active change in the Windows/Intel/AMD/Java world. The need for accurate and precise time measurement has been apparent for several (at least 10) years. Both Intel and AMD have responded by changing how TSC works. Both companies now have something called Invariant-TSC and/or Constant-TSC.
Check out rdtsc accuracy across CPU cores. Quoting from osgx (who refers to an Intel manual).
"16.11.1 Invariant TSC
The time stamp counter in newer processors may support an enhancement, referred to as invariant TSC. Processor's support for invariant TSC is indicated by PUID.80000007H:EDX[8].
The invariant TSC will run at a constant rate in all ACPI P-, C-. and T-states. This is the architectural behavior moving forward. On processors with invariant TSC support, the OS may use the TSC for wall clock timer services (instead of ACPI or HPET timers). TSC reads are much more efficient and do not incur the overhead associated with a ring transition or access to a platform resource."
See also http://www.citihub.com/requesting-timestamp-in-applications/. Quoting from the author
For AMD:
If CPUID 8000_0007.edx[8] = 1, then the TSC rate is ensured to be invariant across all P-States, C-States, and stop-grant transitions (such as STPCLK Throttling); therefore, the TSC is suitable for use as a source of time.
For Intel:
Processor’s support for invariant TSC is indicated by CPUID.80000007H:EDX[8]. The invariant TSC will run at a constant rate in all ACPI P-, C-. and T-states. This is the architectural behaviour moving forward. On processors with invariant TSC support, the OS may use the TSC for wall clock timer services (instead of ACPI or HPET timers). TSC reads are much more efficient and do not incur the overhead associated with a ring transition or access to a platform resource."
Now the really important point is that the latest JVMs appear to exploit the newly reliable TSC mechanisms. There isn't much online to show this. However, do take a look at http://code.google.com/p/disruptor/wiki/PerformanceResults.
"To measure latency we take the three stage pipeline and generate events at less than saturation. This is achieved by waiting 1 microsecond after injecting an event before injecting the next and repeating 50 million times. To time at this level of precision it is necessary to use time stamp counters from the CPU. We choose CPUs with an invariant TSC because older processors suffer from changing frequency due to power saving and sleep states. Intel Nehalem and later processors use an invariant TSC which can be accessed by the latest Oracle JVMs running on Ubuntu 11.04. No CPU binding has been employed for this test"
Note that the authors of the "Disruptor" have close ties to the folks working on the Azul and other JVMs.
See also "Java Flight Records Behind the Scenes". This presentation mentions the new invariant TSC instructions.

"Returns the current value of the most precise available system timer, in nanoseconds.
"This method can only be used to measure elapsed time and is not related to any other notion of system or wall-clock time. The value returned represents nanoseconds since some fixed but arbitrary time (perhaps in the future, so values may be negative). This method provides nanosecond precision, but not necessarily nanosecond accuracy. No guarantees are made about how frequently values change. Differences in successive calls that span greater than approximately 292 years (2**63 nanoseconds) will not accurately compute elapsed time due to numerical overflow."
Note that it says "precise", not "accurate".
It's not a "bug in Java" or a "bug" in anything. It's a definition. The JVM developers look around to find the fastest clock/timer in the system and use it. If that's in lock-step with the system clock then good, but if it's not, that's just the way the cookie crumbles. It's entirely plausible, say, that a computer system will have an accurate system clock but then have a higher-rate timer internally that's tied to the CPU clock rate or some such. Since clock rate is often varied to minimize power consumption, the increment rate of this internal timer would vary.

System.currentTimeMillis() and System.nanoTime() are not necessarily provided by
the same hardware. System.currentTimeMillis(), backed by GetSystemTimeAsFileTime()
has 100ns resolution elements. Its source is the system timer. System.nanoTime() is backed by the system's high performance counter. There is a whole variety of different hardware
providing this counter. Therefore its resolution varies, depending on the underlying hardware.
In no case can it be assumed that these two sources are in phase. Measuring the two values
against each other will disclose a different running speed. If the update of System.currentTimeMillis() is taken as the real progress in time, the output of System.nanoTime() may be sometimes slower, sometimes faster, and also varying.
A careful calibration has to be done in order to phase lock these two time sources.
A more detailed description of the relation between these two time sources can be found
at the Windows Timestamp Project.

Does any version of Windows have a mechanism similar to Linux that adjusts the system clock speed to slowly bring the time-of-day clock into sync with the external clock source? Would such a thing influence both timers, or only the wall-clock timer?
The Windows Timestamp Project does what you are asking for. As far as I know it only affects the wall-clock timer.

Big difference in timestamps when running same application multiple times on an emulator

In an android application I am trying to do a exponential modulus operation and want to calculate the time taken for that process. So i have created 2 timestamps,one just before the operation and the other just after the operation
Calendar calendar0 = Calendar.getInstance();
java.util.Date now0 = calendar0.getTime();
java.sql.Timestamp currentTimestamp0 = new java.sql.Timestamp(now0.getTime());
BigInteger en = big.modPow(e, n);
Calendar calendar1 = Calendar.getInstance();
java.util.Date now1 = calendar1.getTime();
java.sql.Timestamp currentTimestamp1 = new java.sql.Timestamp(now1.getTime());
The difference in time reported by these 2 timestamps is varying over a large range for the same inputs when i am running application multiple times. It gives time in range of [200ns-6ns]
Can someone point out the reason for such result/something that i am doing wrong?

Well for one thing, you're going about your timing in a very convoluted way. Here's something which gives just as much accuracy, but rather simpler:
long start = System.currentTimeMillis();
BigInteger en = big.modPow(e, n);
long end = System.currentTimeMillis();
Note that java.util.Date only has accuracy to the nearest millisecond, and using that same value and putting it in a java.sql.Timestamp doesn't magically make it more accurate. So any result under a millisecond (6ns-200ns) is obviously spurious - basically anything under a millisecond is 0.
You may be able to get a more accurate reading using System.nanoTime - I don't know whether that's supported on Android. There may be alternative high-precision timers available, too.
Now as for why operations may actually take very different amounts of recorded time:
The granularity of the system clock used above may well be considerably less than 1ms. For example, if it measures nothing smaller than 15ms, you could see some operations supposedly taking 0ms and some taking 15ms, even though they actually take the same amount of time.
There can easily be other factors involved, the most obvious being garbage collection

Do more calculations in the trial so that your expected time is several milliseconds.
Consider doing your output to logcat, recording that on the PC and then processing the logcat to only select trials with no intervening platform messages about garbage collection or routine background operations.

A large part of the reason for the variation is that your application is being run by a virtual machine running under a general-purpose operating system on an emulated computer which is running under a general-purpose operating system on a real computer.
With the exception of your application, every one of the things in that pile (the JVM running your app, Android's Linux OS, the emulator and whatever OS runs it) may use the CPU (real or virtual) to do something else at any time for any duration. Those cycles will take away from your program's execution and add to its wall-clock execution time. The result is completely nondeterministic behavior.
If you count emulator CPU cycles consumed and discount whatever Java does behind the scenes, I have no doubt that the standard deviation in execution times would be a lot lower than what you're seeing for the same inputs. The emulated Android environment isn't the place to be benchmarking algorithms because there are too many variables you can't control.

You can't make any conclusions only running a single function. There's so much more going on. You have no control over the underlying linux operating system taking up cycles, let alone the java virtual machine, and yet still all the other programs running. Try running it 1000 or 10000 times

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.