I'm writing a MOS 6502 processor emulator as part of a larger project I've undertaken in my spare time. The emulator is written in Java, and before you say it, I know its not going to be as efficient and optimized as if it was written in c or assembly, but the goal is to make it run on various platforms and its pulling 2.5MHZ on a 1GHZ processor which is pretty good for an interpreted emulator. My problem is quite to the contrary, I need to limit the number of cycles to 1MHZ. Ive looked around but not seen many strategies for doing this. Ive tried a few things including checking the time after a number of cycles and sleeping for the difference between the expected time and the actual time elapsed, but checking the time slows down the emulation by a factor of 8 so does anyone have any better suggestions or perhaps ways to optimize time polling in java to reduce the slowdown?
The problem with using sleep() is that you generally only get a granularity of 1ms, and the actual sleep that you will get isn't necessarily even accurate to the nearest 1ms as it depends on what the rest of the system is doing. A couple of suggestions to try (off the top of my head-- I've not actually written a CPU emulator in Java):
stick to your idea, but check the time between a large-ish number of emulated instructions (execution is going to be a bit "lumpy" anyway especially on a uniprocessor machine, because the OS can potentially take away the CPU from your thread for several milliseconds at a time);
as you want to execute in the order of 1000 emulated instructions per millisecond, you could also try just hanging on to the CPU between "instructions": have your program periodically work out by trial and error how many runs through a loop it needs to go between instructions to "waste" enough CPU to make the timing work out at 1 million emulated instructions / sec on average (you may want to see if setting your thread to low priority helps system performance in this case).
I would use System.nanoTime() in a busy wait as #pst suggested earlier.
You can speed up the emulation by generating byte code. Most instructions should translate quite well and you can add a busy wait call so each instruction takes the amount of time the original instruction would have done. You have an option to increase the delay so you can watch each instruction being executed.
To make it really cool you could generate 6502 assembly code as text with matching line numbers in the byte code. This would allow you to use the debugger to step through the code, breakpoint it and see what the application is doing. ;)
A simple way to emulate the memory is to use direct ByteBuffer or native memory with the Unsafe class to access it. This will give you a block of memory you can access as any data type in any order.
You might be interested in examining the Java Apple Computer Emulator (JACE), which incorporates 6502 emulation. It uses Thread.sleep() in its TimedDevice class.
Have you looked into creating a Timer object that goes off at the cycle length you need it? You could have the timer itself initiate the next loop.
Here is the documentation for the Java 6 version:
http://download.oracle.com/javase/6/docs/api/java/util/Timer.html
Related
I have a Java application and one of the methods is performance-critical.
I created a loop to call this method 10 times and I am checking for performance issues by using the profiler for every iteration. It turned out that the execution time decreases by iterations. Thus, the 10th iteration has a smaller execution time than then 9th iteration.
Any idea why such case is happening?
Could it be due to the loop overheads?
You are warming the CPU caches, and the JVM thus the performance changes.
Profillers put the JVM into an unusual mode, and depending on what profiler approach you are using then it may only be sampling at a regular interval.
I find that profillers are good for giving you relative measurements and to improve your understanding of the code; but always take their reading with a pinch of salt.
Do not trust just a single measurement.
Outside of using profillers, microbenchmarking is a good way to go. Although it is a very tricky subject.
Note that Hotspot tends not to kick in and optimise the byte codes until the target code has been called 10,000 or more times.
http://java.dzone.com/articles/microbenchmarking-java, and How do I write a correct micro-benchmark in Java? may help to get you started. There is also a lot of good advice on the Mechanical Sympathy Forum.
A good microbenchmarking framework is here http://openjdk.java.net/projects/code-tools/jmh/, it helps keep GC, and other JVM stop-the-world events out of the timings. As well as some guidence on how to prevent Hotspot from optimising out the very code that you are trying to measure.
I want to optimize a method so it runs in less time. I was using System.currentTimeMillis() to calculate the time it lasted.
However, I just read the System.currentTimeMillis() Javadoc and it says this:
This method shouldn't be used for measuring timeouts or other elapsed
time measurements, as changing the system time can affect the results.
So, if I shouldn't use it to measure the elapsed time, how should I measure it?
Android native Traceview will help you measuring the time and also will give you more information.
Using it is as simple as:
// start tracing to "/sdcard/calc.trace"
Debug.startMethodTracing("calc");
// ...
// stop tracing
Debug.stopMethodTracing();
A post with more information in Android Developers Blog
Also take #Rajesh J Advani post into account.
There are a few issues with System.currentTimeMillis().
if you are not in control of the system clock, you may be reading the elapsed time wrong.
For server code or other long running java programs, your code is likely going to be called in over a few thousand iterations. By the end of this time, the JVM will have optimized the bytecode to the extent where the time taken is actually a lot lesser than what you measured as part of your testing.
It doesn't take into account the fact that there might be other processes on your computer or other threads in the JVM that compete for CPU time.
You can still use the method, but you need to keep the above points in mind. As others have mentioned, a profiler is a much better way of measuring system performance.
Welcome to the world of benchmarking.
As others point out - techniques based on timing methods like currentTimeMillis will only tell you elapsed time, rather than how long the method spent in the CPU.
I'm not aware of a way in Java to isolate timings of a method to how long it spent on the CPU - the answer is to either:
1) If the method is long running (or you run it many times, while using benchmarking rules like do not discard every result), use something like the "time" tool on Linux (http://linux.die.net/man/1/time) who will tell you how long the app spent on the CPU (obviously you have to take away the overhead of the application startup etc).
2) Use a profiler as others pointed out. This has dangers such as adding too much overhead using tracing techniques - if it uses stack sampling, it won't be 100% accurate
3) Am not sure how feasible this is on android - but you could get your bechmark running on a quiet multicore system and isolate a core (or ideally whole socket) to only be able to run your application.
You can use something called System.nanoTime(). As given here
http://docs.oracle.com/javase/1.5.0/docs/api/java/lang/System.html#nanoTime()
As the document says
This method can only be used to measure elapsed time and is not related to any other notion of system or wall-clock time.
Hope this will help.
SystemClock.elapsedRealtime()
Quoting words in the linked page: elapsedRealtime() and elapsedRealtimeNanos() return the time since the system was booted, and include deep sleep. This clock is guaranteed to be monotonic, and continues to tick even when the CPU is in power saving modes, so is the recommend basis for general purpose interval timing.
I want to filter what classes are being cpu-profiled in Java VisualVm (Version 1.7.0 b110325). For this, I tried under Profiler -> Settings -> CPU-Settings to set "Profile only classes" to my package under test, which had no effect. Then I tried to get rid of all java.* and sun.* classes by setting them in "Do not profile classes", which had no effect either.
Is this simply a bug? Or am I missing something? Is there a workaround? I mean other than:
paying for a better profiler
doing sampling by hand (see One could use a profiler, but why not just halt the program?)
switch to the Call Tree view, which is no good since only the Profiler view gives me the percentages of consumed CPU per method.
I want to do this mainly to get halfway correct percentages of consumed CPU per method. For this, I need to get rid of the annoying measurements, e.g. for sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run() (around 70%). Many users seem to have this problem, see e.g.
Java VisualVM giving bizarre results for CPU profiling - Has anyone else run into this?
rmi.transport.tcp.tcptransport Connectionhandler consumes much CPU
Can't see my own application methods in Java VisualVM.
The reason you see sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run() in the profile is that you left the option Profile new Runnables selected.
Also, if you took a snapshot of your profiling session you would be able to see the whole callstack for any hotspot method - this way you could navigate from the run() method down to your own application logic methods, filtering out the noise generated by the Profile new Runnables option.
OK, since your goal is to make the code run as fast as possible, let me suggest how to do it.
I'm no expert on VisualVM, but I can tell you what works. (Only a few profilers actually tell you what you need to know, which is - which lines of your code are on the stack a healthy fraction of wall-clock time.)
The only measuring I ever bother with is some stopwatch on the overall time, or alternatively, if the code has something like a framerate, the number of frames per second. I don't need any sort of further precision breakdown, because it's at best a remote clue to what's wasting time (and more often totally irrelevant), when there's a very direct way to locate it.
If you don't want to do random-pausing, that's up to you, but it's proven to work, and here's an example of a 43x speedup.
Basically, the idea is you get a (small, like 10) number of stack samples, taken at random wall-clock times.
Each sample consists (obviously) of a list of call sites, and possibly a non-call site at the end.
(If the sample is during I/O or sleep, it will end in the system call, which is just fine. That's what you want to know.)
If there is a way to speed up your code (and there almost certainly is), you will see it as a line of code that appears on at least one of the stack samples.
The probability it will appear on any one sample is exactly the same as the fraction of time it uses.
So if there's a call site or other line of code using a healthy fraction of time, and you can avoid executing it, the overall time will decrease by that fraction.
I don't know every profiler, but one I know that can tell you that is Zoom.
Others may be able to do it.
They may be more spiffy, but they don't work any quicker or better than the manual method when your purpose is to maximize performance.
Is it possible to slow down time in the Java virtual machine according to CPU usage by modification of the source code of OpenJDK? I have a network simulation (Java to ns-3) which consumes real time, synchronised loosely to the wall clock. However, because I run so many clients in the simulation, the CPU usage hits 100% and hard guarantees aren't maintained about how long events in the simulator should take to process (i.e., a high amount of super-late events). Therefore, the simulation tops out at around 40 nodes when there's a lot of network traffic, and even then it's a bit iffy. The ideal solution would be to slow down time according to CPU, but I'm not sure how to do this successfully. A lesser solution is to just slow down time by some multiple (time lensing?).
If someone could give some guidance, the source code for the relevant file in question (for Windows) is at http://pastebin.com/RSQpCdbD. I've tried modifying some parts of the file, but my results haven't really been very successful.
Thanks in advance,
Chris
You might look at VirtualBox, which allows one to Accelerate or slow down the guest clock from the command line.
I'm not entirely sure if this is what you want but, with the Joda-time library you can stop time completely. So calls to new Date() or new DateTime() within Joda-time will continously return the same time.
So, you could, in one Thread "stop time" with this call:
DateTimeUtils.setCurrentMillisFixed(System.currentTimeMillis());
Then your Thread could sleep for, say, 5000ms, and then call:
// advance time by one second
DateTimeUtils.setCurrentMillisFixed(System.currentTimeMillis() + 1000);
So provided you application is doing whatever it does based on the time within the system this will "slow" time by setting time forwards one second every 5 seconds.
But, as i said... i'm not sure this will work in your environment.
Debugging performance problems using a standard debugger is almost hopeless since the level of detail is too high. Other ways are using a profiler, but they seldom give me good information, especially when there is GUI and background threads involved, as I never know whether the user was actually waiting for the computer, or not. A different way is simply using Control + C and see where in the code it stops.
What I really would like is to have Fast Forward, Play, Pause and Rewind functionality combined with some visual repressentation of the code. This means that I could set the code to run on Fast Forward until I navigate the GUI to the critical spot. Then I set the code to be run in slow mode, while I get some visual repressentation of, which lines of are being executed (possibly some kind of zoomed out view of the code). I could for example set the execution speed to something like 0.0001x. I believe that I would get a very good visualization this way of whether the problem is inside a specific module, or maybe in the communication between modules.
Does this exist? My specific need is in Python, but I would be interested in seeing such functionality in any language.
The "Fast Forward to critical spot" function already exists in any debugger, it's called a "breakpoint". There are indeed debuggers that can slow down execution, but that will not help you debug performance problems, because it doesn't slow down the computer. The processor and disk and memory is still exactly as slow as before, all that happens is that the debugger inserts delays between each line of code. That means that every line of code suddenly take more or less the same time, which means that it hides any trace of where the performance problem is.
The only way to find the performance problems is to record every call done in the application and how long it took. This is what a profiler does. Indeed, using a profiler is tricky, but there probably isn't a better option. In theory you could record every call and the timing of every call, and then play that back and forwards with a rewind, but that would use an astonishing amount of memory, and it wouldn't actually tell you anything more than a profiler does (indeed, it would tell you less, as it would miss certain types of performance problems).
You should be able to, with the profiler, figure out what is taking a long time. Note that this can be both by certain function calls taking a long time because they do a lot of processing, or it can be system calls that take a long time becomes something (network/disk) is slow. Or it can be that a very fast call is called loads and loads of times. A profiler will help you figure this out. But it helps if you can turn the profiler on just at the critical section (reduces noise) and if you can run that critical section many times (improves accuracy).
The methods you're describing, and many of the comments, seem to me to be relatively weak probabilistic attempts to understand the performance impact. Profilers do work perfectly well for GUIs and other idle-thread programs, though it takes a little practice to read them. I think your best bet is there, though -- learn to use the profiler better, that's what it's for.
The specific use you describe would simply be to attach the profiler but don't record yet. Navigate the GUI to the point in question. Hit the profiler record button, do the action, and stop the recording. View the results. Fix. Do it again.
I assume there is a phase in the app's execution that takes too long - i.e. it makes you wait.
I assume what you really want is to see what you could change to make it faster.
A technique that works is random-pausing.
You run the app under the debugger, and in the part of its execution that makes you wait, pause it, and examine the call stack. Do this a few times.
Here are some ways your program could be spending more time than necessary.
I/O that you didn't know about and didn't really need.
Allocating and releasing objects very frequently.
Runaway notifications on data structures.
others too numerous to mention...
No matter what it is, when it is happening, an examination of the call stack will show it.
Once you know what it is, you can find a better way to do it, or maybe not do it at all.
If the program is taking 5 seconds when it could take 1 second, then the probability you will see the problem on each pause is 4/5. In fact, any function call you see on more than one stack sample, if you could avoid doing it, will give you a significant speedup.
AND, nearly every possible bottleneck can be found this way.
Don't think about function timings or how many times they are called. Look for lines of code that show up often on the stack, that you don't need.
Example Added: If you take 5 samples of the stack, and there's a line of code appearing on 2 of them, then it is responsible for about 2/5 = 40% of the time, give or take. You don't know the precise percent, and you don't need to know.
(Technically, on average it is (2+1)/(5+2) = 3/7 = 43%. Not bad, and you know exactly where it is.)