I would like to ask whether there is some simple way to determine cpu usage per thread in java. Thanks
I believe the JConsole (archived link) does provide this kind of information through a plugin
It uses ThreadMXBean getThreadCpuTime() function.
Something along the line of:
long upTime = runtimeProxy.getUptime();
List<Long> threadCpuTime = new ArrayList<Long>();
for (int i = 0; i < threadIds.size(); i++) {
long threadId = threadIds.get(i);
if (threadId != -1) {
threadCpuTime.add(threadProxy.getThreadCpuTime(threadId));
} else {
threadCpuTime.add(0L);
}
}
int nCPUs = osProxy.getAvailableProcessors();
List<Float> cpuUsageList = new ArrayList<Float>();
if (prevUpTime > 0L && upTime > prevUpTime) {
// elapsedTime is in ms
long elapsedTime = upTime - prevUpTime;
for (int i = 0; i < threadIds.size(); i++) {
// elapsedCpu is in ns
long elapsedCpu = threadCpuTime.get(i) - prevThreadCpuTime.get(i);
// cpuUsage could go higher than 100% because elapsedTime
// and elapsedCpu are not fetched simultaneously. Limit to
// 99% to avoid Chart showing a scale from 0% to 200%.
float cpuUsage = Math.min(99F, elapsedCpu / (elapsedTime * 1000000F * nCPUs));
cpuUsageList.add(cpuUsage);
}
}
by using java.lang.management.ThreadMXBean. How to obtain a ThreadMXBean:
ThreadMXBean tmxb = ManagementFactory.getThreadMXBean();
then you can query how much a specific thread is consuming by using:
long cpuTime = tmxb.getThreadCpuTime(aThreadID);
Hope it helps.
Option_1: Code level
In your business logic code; in the beginning call start() API and in the finally block call stop(). So that you will get CPU time for executing your logic by the current running thread. Then log it. Reference.
class CPUTimer
{
private long _startTime = 0l;
public void start ()
{
_startTime = getCpuTimeInMillis();
}
public long stop ()
{
long result = (getCpuTimeInMillis() - _startTime);
_startTime = 0l;
return result;
}
public boolean isRunning ()
{
return _startTime != 0l;
}
/** thread CPU time in milliseconds. */
private long getCpuTimeInMillis ()
{
ThreadMXBean bean = ManagementFactory.getThreadMXBean();
return bean.isCurrentThreadCpuTimeSupported() ? bean.getCurrentThreadCpuTime()/1000000: 0L;
}
}
Option_2: Monitor level using plugins (AIX IBM box which don't have jvisualvm support)
If you think it is delay in adding code now, then you can prefer JConsole with plugins support. I followed this article. Download the topthreads jar from that article and run ./jconsole -pluginpath topthreads-1.1.jar
Option_3: Monitor level using TOP (shift H) + JSTACK (Unix machine which has 'Shif+H' support)
Follow this tutorial, where top command will give option to find top CPU thread (nid). Take that check that nid in jstack output file.
Try the "TopThreads" JConsole plugin. See http://lsd.luminis.nl/top-threads-plugin-for-jconsole/
Though this is platform dependent, I believe what you're looking for is the ThreadMXBean: http://java.sun.com/j2se/1.5.0/docs/api/java/lang/management/ThreadMXBean.html . You can use the getThreadUserTime method, for example, to get what you need. To check if your platform supports CPU measurement, you can call isThreadCpuTimeSupported() .
Indeed the object ThreadMXBean provides the functionality you need (however it might not be implemented on all virtual machines).
In JDK 1.5 there was a demo program doing exactly what you need. It was in the folder demo/management and it was called JTop.java
Unfortnately, it's not there in Java6. Maybe you can find at with google or download JDK5.
Related
The short code below isolates the problem. Basically I'm timing the method addToStorage. I start by executing it one million times and I'm able to get its time down to around 723 nanoseconds. Then I do a short pause (using a busy spinning method not to release the cpu core) and time the method again N times, on a different code location. For my surprise I find that the smaller the N the bigger is the addToStorage latency.
For example:
If N = 1 then I get 3.6 micros
If N = 2 then I get 3.1 and 2.5 micros
if N = 5 then I get 3.7, 1.8, 1.7, 1.5 and 1.5 micros
Does anyone know why this is happening and how to fix it? I would like my method to consistently perform at the fastest time possible, no matter where I call it.
Note: I would not think it is thread related since I'm not using Thread.sleep. I've also tested using taskset to pin my thread to a cpu core with the same results.
import java.util.ArrayList;
import java.util.List;
public class JvmOdd {
private final StringBuilder sBuilder = new StringBuilder(1024);
private final List<String> storage = new ArrayList<String>(1024 * 1024);
public void addToStorage() {
sBuilder.setLength(0);
sBuilder.append("Blah1: ").append(System.nanoTime()).append('\n');
sBuilder.append("Blah2: ").append(System.nanoTime()).append('\n');
sBuilder.append("Blah3: ").append(System.nanoTime()).append('\n');
sBuilder.append("Blah4: ").append(System.nanoTime()).append('\n');
sBuilder.append("Blah5: ").append(System.nanoTime()).append('\n');
sBuilder.append("Blah6: ").append(System.nanoTime()).append('\n');
sBuilder.append("Blah7: ").append(System.nanoTime()).append('\n');
sBuilder.append("Blah8: ").append(System.nanoTime()).append('\n');
sBuilder.append("Blah9: ").append(System.nanoTime()).append('\n');
sBuilder.append("Blah10: ").append(System.nanoTime()).append('\n');
storage.add(sBuilder.toString());
}
public static long mySleep(long t) {
long x = 0;
for(int i = 0; i < t * 10000; i++) {
x += System.currentTimeMillis() / System.nanoTime();
}
return x;
}
public static void main(String[] args) throws Exception {
int warmup = Integer.parseInt(args[0]);
int mod = Integer.parseInt(args[1]);
int passes = Integer.parseInt(args[2]);
int sleep = Integer.parseInt(args[3]);
JvmOdd jo = new JvmOdd();
// first warm up
for(int i = 0; i < warmup; i++) {
long time = System.nanoTime();
jo.addToStorage();
time = System.nanoTime() - time;
if (i % mod == 0) System.out.println(time);
}
// now see how fast the method is:
while(true) {
System.out.println();
// Thread.sleep(sleep);
mySleep(sleep);
long minTime = Long.MAX_VALUE;
for(int i = 0; i < passes; i++) {
long time = System.nanoTime();
jo.addToStorage();
time = System.nanoTime() - time;
if (i > 0) System.out.print(',');
System.out.print(time);
minTime = Math.min(time, minTime);
}
System.out.println("\nMinTime: " + minTime);
}
}
}
Executing:
$ java -server -cp . JvmOdd 1000000 100000 1 5000
59103
820
727
772
734
767
730
726
840
736
3404
MinTime: 3404
There is so much going on in here that I don't know where to start. But lets start here....
long time = System.nanoTime();
jo.addToStorage();
time = System.nanoTime() - time;
The latency of addToStoarge() cannot be measured using this technique. It simply runs for too quickly meaning you're likely below the resolution of the clock. Without running this, my guess is that your measures are dominated by clock edge counts. You'll need to bulk up the unit of work to get a measure with lower levels of noise in it.
As for what is happening? There are a number of call site optimizations the most important being inlining. Inlining would totally eliminate the call site but it's a path specific optimization. If you call the method from a different place, that would follow the slow path of performing a virtual method lookup followed by a jump to that code. So to see the benefits of inlining from a different path, that path would also have to be "warmed up".
I would strongly recommend that you look at both JMH (delivered with the JDK). There are facilities in there such as blackhole which will help with the effects of CPU clocks winding down. You might also want to evaluate the quality of the bench with the help of tools like JITWatch (Adopt OpenJDK project) which will take logs produced by the JIT and help you interrupt them.
There is so much to this subject, but the bottom line is that you can't write a simplistic benchmark like this and expect it to tell you anything useful. You will need to use JMH.
I suggest watching this: https://www.infoq.com/presentations/jmh about microbenchmarking and JMH
There's also a chapter on microbenchmarking & JMH in my book: http://shop.oreilly.com/product/0636920042983.do
Java internally uses JIT(Just in Compiler). Based on the number of times the same method executes it optimizes the instruction and perform better. For lesser values, the usage of method would be normal which may not fall under optimization that shows the execution time more. When the same method called more time, it uses JIT and executes in lesser time because of the optimized instruction for the same method execution.
I'm encountering a really unusual issue here. It seems that the calling of Thread.sleep(n), where n > 0 would cause the following System.nanoTime() calls to be less predictable.
The code below demonstrates the issue.
Running it on my computer (rMBP 15" 2015, OS X 10.11, jre 1.8.0_40-b26) outputs the following result:
Control: 48497
Random: 36719
Thread.sleep(0): 48044
Thread.sleep(1): 832271
On a Virtual Machine running Windows 8 (VMware Horizon, Windows 8.1, are 1.8.0_60-b27):
Control: 98974
Random: 61019
Thread.sleep(0): 115623
Thread.sleep(1): 282451
However, running it on an enterprise server (VMware, RHEL 6.7, jre 1.6.0_45-b06):
Control: 1385670
Random: 1202695
Thread.sleep(0): 1393994
Thread.sleep(1): 1413220
Which is surprisingly the result I expect.
Clearly the Thread.sleep(1) affects the computation of the below code. I have no idea why this happens. Does anyone have a clue?
Thanks!
public class Main {
public static void main(String[] args) {
int N = 1000;
long timeElapsed = 0;
long startTime, endTime = 0;
for (int i = 0; i < N; i++) {
startTime = System.nanoTime();
//search runs here
endTime = System.nanoTime();
timeElapsed += endTime - startTime;
}
System.out.println("Control: " + timeElapsed);
timeElapsed = 0;
for (int i = 0; i < N; i++) {
startTime = System.nanoTime();
//search runs here
endTime = System.nanoTime();
timeElapsed += endTime - startTime;
for (int j = 0; j < N; j++) {
int k = (int) Math.pow(i, j);
}
}
System.out.println("Random: " + timeElapsed);
timeElapsed = 0;
for (int i = 0; i < N; i++) {
startTime = System.nanoTime();
//search runs here
endTime = System.nanoTime();
timeElapsed += endTime - startTime;
try {
Thread.sleep(0);
} catch (InterruptedException e) {
break;
}
}
System.out.println("Thread.sleep(0): " + timeElapsed);
timeElapsed = 0;
for (int i = 0; i < N; i++) {
startTime = System.nanoTime();
//search runs here
endTime = System.nanoTime();
timeElapsed += endTime - startTime;
try {
Thread.sleep(2);
} catch (InterruptedException e) {
break;
}
}
System.out.println("Thread.sleep(1): " + timeElapsed);
}
}
Basically I'm running a search within a while-loop which takes a break every iteration by calling Thread.sleep(). I want to exclude the sleep time from the overall time taken to run the search, so I'm using System.nanoTime() to record the start and finishing times. However, as you notice above, this doesn't work well.
Is there a way to remedy this?
Thanks for any input!
This is a complex topic because the timers used by the JVM are highly CPU- and OS-dependent and also change with JVM versions (e.g. by using newer OS APIs). Virtual machines may also limit the CPU capabilities they pass through to guests, which may alter the choices in comparison to a bare metal setup.
On x86 the RDTSC instruction provides the lowest latency and best granularity of all clocks, but under some configurations it's not available or reliable enough as a time source.
On linux you should check kernel startup messages (dmesg), the tsc-related /proc/cpuinfo flags and the selected /sys/devices/system/clocksource/*/current_clocksource. The kernel will try to use TSC by default, if it doesn't there may be a reason for that.
For some history you may want to read the following, but note that some of those articles may be a bit dated, TSC reliability has improved a lot over the years:
OpenJDK Bug 8068730 exposing more precise system clocks in Java 9 through the Date and Time APIs introduced in java 8
http://shipilev.net/blog/2014/nanotrusting-nanotime/ (mentions the -XX:+AssumeMonotonicOSTimers manual override/footgun)
https://blog.packagecloud.io/eng/2017/03/14/using-strace-to-understand-java-performance-improvement/ (mentions the similar option for linux UseLinuxPosixThreadCPUClocks)
https://btorpey.github.io/blog/2014/02/18/clock-sources-in-linux/
https://stas-blogspot.blogspot.de/2012/02/what-is-behind-systemnanotime.html
https://en.wikipedia.org/wiki/Time_Stamp_Counter (especially CPU capabilities, constant_tsc tsc_reliable nonstop_tsc in linux nomenclature)
http://vanillajava.blogspot.de/2012/04/yield-sleep0-wait01-and-parknanos1.html
I can suggest at least two possible reasons of such behavior:
Power saving. When executing a busy loop, CPU runs at its maximum performance state. However, after Thread.sleep it is likely to fall into one of power-saving states, with frequency and voltage reduced. After than CPU won't return to its maximum performance immediately, this may take from several nanoseconds to microseconds.
Scheduling. After a thread is descheduled due to Thread.sleep, it will be scheduled for execution again after a timer event which might be related to the timer used for System.nanoTime.
In both cases you can't directly work around this - I mean Thread.sleep will also affect timings in your real application. But if the amount of useful work measured is large enough, the inaccuracy will be negligible.
The inconsistencies probably arise not from Java, but from the different OSs and VMs "atomic-" or system- clocks themselves.
According to the official .nanoTime() documentation:
no guarantees are made except that the resolution is at least as good
as that of currentTimeMillis()
source
...I can tell from personal knowledge that this is because in some OSs and VMs, the system itself doesn't support "atomic" clocks, which are necessary for higher resolutions. (I will post the link to source this information as soon as I find it again...It's been a long time.)
Is there a way in monitoring CPU usage using pure Java?
There is a gem in the comments on the article which kgiannakakis linked:
javasysmon
JavaSysMon manages processes and
reports useful system performance
metrics cross-platform. You can think
of it as a cross-platform version of
the UNIX `top’ command, along with the
ability to kill processes. It comes in
the form of a single JAR file /..
-works on Windows, Mac OS X, Linux, and Solaris.
How about using jmx mbeans?
final OperatingSystemMXBean myOsBean=
ManagementFactory.getOperatingSystemMXBean();
double load = myOsBean.getSystemLoadAverage();
You can use jMX beans to calculate a CPU load. Note that this measures CPU load of your java program, not the overall system load. (the question didn't specify which)
Initialize:
ThreadMXBean newBean = ManagementFactory.getThreadMXBean();
try
{
if (this.newBean.isThreadCpuTimeSupported())
this.newBean.setThreadCpuTimeEnabled(true);
else
throw new AccessControlException("");
}
catch (AccessControlException e)
{
System.out.println("CPU Usage monitoring is not available!");
System.exit(0);
}
Then as your loop (assuming your application uses a loop, otherwise what's the point in measuring CPU usage?) use this:
long lastTime = System.nanoTime();
long lastThreadTime = newBean.getCurrentThreadCpuTime();
while (true)
{
// Do something that takes at least 10ms (on windows)
try
{
int j = 0;
for (int i = 0; i < 20000000; i++)
j = (j + i) * j / 2;
Thread.sleep(100);
}
catch (InterruptedException e)
{
}
// Calculate coarse CPU usage:
long time = System.nanoTime();
long threadTime = newBean.getCurrentThreadCpuTime();
double load = (threadTime - lastThreadTime) / (double)(time - lastTime);
System.out.println((float)load);
// For next iteration.
lastTime = time;
lastThreadTime = threadTime;
}
You need to use double precision because a long doesn't fit in a float (though it might work 99.9999999999999999% of the time)
If the 'something' you're doing takes less than approximately 1.6ms (Windows), then the returned value will not even have increased at all and you'll perpetually measure 0% CPU erroneously.
Because getCurrentThreadCpuTime is VERY inaccurate (with delays less than 100ms), smoothing it helps a lot:
long lastTime = System.nanoTime();
long lastThreadTime = newBean.getCurrentThreadCpuTime();
float smoothLoad = 0;
while (true)
{
// Do something that takes at least 10ms (on windows)
try
{
int j = 0;
for (int i = 0; i < 2000000; i++)
j = (j + i) * j / 2;
Thread.sleep(10);
}
catch (InterruptedException e)
{
}
// Calculate coarse CPU usage:
long time = System.nanoTime();
long threadTime = newBean.getCurrentThreadCpuTime();
double load = (threadTime - lastThreadTime) / (double)(time - lastTime);
// Smooth it.
smoothLoad += (load - smoothLoad) * 0.1; // damping factor, lower means less responsive, 1 means no smoothing.
System.out.println(smoothLoad);
// For next iteration.
lastTime = time;
lastThreadTime = threadTime;
}
This is not possible using pure Java. See this article for some ideas.
Maybe if stuck, you might 'sense' cpu availability by running an intermittent bogomips calculator in a background thread, and smoothing and normalising its findings.
...worth a shot no :?
if you are using linux - just use jconsole - you will get all the track of java memory management
My general experience with Java 7 tells me that it is faster than Java 6. However, I've run into enough information that makes me believe that this is not always the case.
The first bit of information comes from Minecraft Snooper data found here. My intention was to look at that data to determine the effects of the different switches used to launch Minecraft. For example I wanted to know if using -Xmx4096m had a negative or positive effect on performance. Before I could get there I looked at the different version of Java being used. It covers everything from 1.5 to a developer using 1.8. In general as you increase the java version you see an increase in fps performance. Throughout the different versions of 1.6 you even see this gradual trend up. I honestly wasn't expecting to see as many different versions of java still in the wild but I guess people don't run the updates like they should.
Some time around the later versions of 1.6 you get the highest peeks. 1.7 performs about 10fps on average below the later versions of 1.6 but still higher than the early versions of 1.6. On a sample from my own system it's almost impossible to see the difference but when looking at the broader sample it's clear.
To control for the possibility that someone might have found a magic switch for Java I control with by only looking at the data with No switches being passed. That way I'd have a reasonable control before I started looking at the different flags.
I dismissed most of what I was seeing as this could be some Magic Java 6 that someone's just not sharing with me.
Now I've been working on another project that requires me to pass an array in an InputStream to be processed by another API. Initially I used a ByteArrayInputStream because it would work out of the box. When I looked at the code for it I noticed that every function was synchronized. Since this was unnecessary for this project I rewrote one with the synchronization stripped out. I then decided that I wanted to know what the general cost of Synchronization was for me in this situation.
I mocked up a simple test just to see. I timed everything in with System.nanoTime() and used Java 1.6_20 x86 and 1.7.0-b147 AMD64, and 1.7_15 AMD64 and using the -server. I expected the AMD64 version to outperform based on architecture alone and have any java 7 advantages. I also looked at the 25th, 50th, and 75th percentile (blue,red,green). However 1.6 with no -server beat the pants off of every other configuration.
So my question is.
What is in the 1.6 -server option that is impacting performance that is also defaulted to on in 1.7?
I know most of the speed enhancement in 1.7 came from defaulting some of the more radical performance options in 1.6 to on, but one of them is causing a performance difference. I just don't know which ones to look at.
public class ByteInputStream extends InputStream {
public static void main(String args[]) throws IOException {
String song = "This is the song that never ends";
byte[] data = song.getBytes();
byte[] read = new byte[data.length];
ByteArrayInputStream bais = new ByteArrayInputStream(data);
ByteInputStream bis = new ByteInputStream(data);
long startTime, endTime;
for (int i = 0; i < 10; i++) {
/*code for ByteInputStream*/
/*
startTime = System.nanoTime();
for (int ctr = 0; ctr < 1000; ctr++) {
bis.mark(0);
bis.read(read);
bis.reset();
}
endTime = System.nanoTime();
System.out.println(endTime - startTime);
*/
/*code for ByteArrayInputStream*/
startTime = System.nanoTime();
for (int ctr = 0; ctr < 1000; ctr++) {
bais.mark(0);
bais.read(read);
bais.reset();
}
endTime = System.nanoTime();
System.out.println(endTime - startTime);
}
}
private final byte[] array;
private int pos;
private int min;
private int max;
private int mark;
public ByteInputStream(byte[] array) {
this(array, 0, array.length);
}
public ByteInputStream(byte[] array, int offset, int length) {
min = offset;
max = offset + length;
this.array = array;
pos = offset;
}
#Override
public int available() {
return max - pos;
}
#Override
public boolean markSupported() {
return true;
}
#Override
public void mark(int limit) {
mark = pos;
}
#Override
public void reset() {
pos = mark;
}
#Override
public long skip(long n) {
pos += n;
if (pos > max) {
pos = max;
}
return pos;
}
#Override
public int read() throws IOException {
if (pos >= max) {
return -1;
}
return array[pos++] & 0xFF;
}
#Override
public int read(byte b[], int off, int len) {
if (pos >= max) {
return -1;
}
if (pos + len > max) {
len = max - pos;
}
if (len <= 0) {
return 0;
}
System.arraycopy(array, pos, b, off, len);
pos += len;
return len;
}
#Override
public void close() throws IOException {
}
}// end class
I think, as the others are saying, that your tests are too short to see the core issues - the graph is showing nanoTime, and that implies the core section being measured completes in 0.0001 to 0.0006s.
Discussion
The key difference in -server and -client is that -server expects the JVM to be around for a long time and therefore expends effort early on for better long-term results. -client aims for fast startup times and good-enough performance.
In particular hotspot runs with more optimizations, and these take more CPU to execute. In other words, with -server, you may be seeing the cost of the optimizer outweighing any gains from the optimization.
See Real differences between "java -server" and "java -client"?
Alternatively, you may also be seeing the effects of tiered compilation where, in Java 7, hotspot doesn't kick in so fast. With only 1000 iterations, the full optimization of your code won't be done until later, and the benefits will therefore be lesser.
You might get insight if you run java with the -Xprof option the JVM will dump some data about the time spent in various methods, both interpreted and compiled. It should give an idea about what was compiled, and the ratio of (cpu) time before hotspot kicked in.
However, to get a true picture, you really need to run this much longer - secondsminutes, not milliseconds - to allow Java and the OS to warm up. It would be even better to loop the test in main (so you have a loop containing your instrumented main test loop) so that you can ignore the warm-up.
EDIT Changed seconds to minutes to ensure that hotspot, the jvm and the OS are properly 'warmed up'
This question already has answers here:
How do I time a method's execution in Java?
(42 answers)
Closed 9 years ago.
How do I calculate the time taken for the execution of a method in Java?
To be more precise, I would use nanoTime() method rather than currentTimeMillis():
long startTime = System.nanoTime();
myCall();
long stopTime = System.nanoTime();
System.out.println(stopTime - startTime);
In Java 8 (output format is ISO-8601):
Instant start = Instant.now();
Thread.sleep(63553);
Instant end = Instant.now();
System.out.println(Duration.between(start, end)); // prints PT1M3.553S
Guava Stopwatch:
Stopwatch stopwatch = Stopwatch.createStarted();
myCall();
stopwatch.stop(); // optional
System.out.println("Time elapsed: "+ stopwatch.elapsed(TimeUnit.MILLISECONDS));
You can take timestamp snapshots before and after, then repeat the experiments several times to average to results. There are also profilers that can do this for you.
From "Java Platform Performance: Strategies and Tactics" book:
With System.currentTimeMillis()
class TimeTest1 {
public static void main(String[] args) {
long startTime = System.currentTimeMillis();
long total = 0;
for (int i = 0; i < 10000000; i++) {
total += i;
}
long stopTime = System.currentTimeMillis();
long elapsedTime = stopTime - startTime;
System.out.println(elapsedTime);
}
}
With a StopWatch class
You can use this StopWatch class, and call start() and stop before and after the method.
class TimeTest2 {
public static void main(String[] args) {
Stopwatch timer = new Stopwatch().start();
long total = 0;
for (int i = 0; i < 10000000; i++) {
total += i;
}
timer.stop();
System.out.println(timer.getElapsedTime());
}
}
See here (archived).
NetBeans Profiler:
Application Performance Application
Performance profiles method-level CPU
performance (execution time). You can
choose to profile the entire
application or a part of the
application.
See here.
Check this: System.currentTimeMillis.
With this you can calculate the time of your method by doing:
long start = System.currentTimeMillis();
class.method();
long time = System.currentTimeMillis() - start;
In case you develop applications for Android you should try out the TimingLogger class.
Take a look at these articles describing the usage of the TimingLogger helper class:
Measuring performance in the Android SDK (27.09.2010)
Discovering the Android API - Part 1 (03.01.2017)
You might want to think about aspect-oriented programming. You don't want to litter your code with timings. You want to be able to turn them off and on declaratively.
If you use Spring, take a look at their MethodInterceptor class.
If you are currently writing the application, than the answer is to use System.currentTimeMillis or System.nanoTime serve the purpose as pointed by people above.
But if you have already written the code, and you don't want to change it its better to use Spring's method interceptors. So for instance your service is :
public class MyService {
public void doSomething() {
for (int i = 1; i < 10000; i++) {
System.out.println("i=" + i);
}
}
}
To avoid changing the service, you can write your own method interceptor:
public class ServiceMethodInterceptor implements MethodInterceptor {
public Object invoke(MethodInvocation methodInvocation) throws Throwable {
long startTime = System.currentTimeMillis();
Object result = methodInvocation.proceed();
long duration = System.currentTimeMillis() - startTime;
Method method = methodInvocation.getMethod();
String methodName = method.getDeclaringClass().getName() + "." + method.getName();
System.out.println("Method '" + methodName + "' took " + duration + " milliseconds to run");
return null;
}
}
Also there are open source APIs available for Java, e.g. BTrace.
or Netbeans profiler as suggested above by #bakkal and #Saikikos.
Thanks.
As proposed nanoTime () is very precise on short time scales.
When this precision is required you need to take care about what you really measure.
Especially not to measure the nanotime call itself
long start1 = System.nanoTime();
// maybe add here a call to a return to remove call up time, too.
// Avoid optimization
long start2 = System.nanoTime();
myCall();
long stop = System.nanoTime();
long diff = stop - 2*start2 + start1;
System.out.println(diff + " ns");
By the way, you will measure different values for the same call due to
other load on your computer (background, network, mouse movement, interrupts, task switching, threads)
cache fillings (cold, warm)
jit compiling (no optimization, performance hit due to running the compiler, performance boost due to compiler (but sometimes code with jit is slower than without!))
Nanotime is in fact not even good for elapsed time because it drifts away signficantly more than currentTimeMillis. Furthermore nanotime tends to provide excessive precision at the expense of accuracy. It is therefore highly inconsistent,and needs refinement.
For any time measuring process,currentTimeMillis (though almost as bad), does better in terms of balancing accuracy and precision.