Which JVM to choose for GC hacking?

Which JVM to choose for GC hacking? - java

I have a design for a GC algorithm that I would like to implement for a JVM, to allow benchmarking.
Does anyone have any experience as to which implementation would allow the easy hacking, but which still has a built in GC that would make for a meaningful comparison?
Edited: I want a JVM that has garbage collection, as I want to collect stats using it, then rip out their GC, put my own in, and the compare. I want it to have a good GC, as otherwise the comparison is meaning, but I want something with code that is not too difficult to work with (HotSpot has a lot of assembler, making the task more difficult)

I think that the Maxine Research VM from Oracle Labs would be a perfect match for your needs.
Quote from the first page of their wiki:
Project Overview
In this era of modern, managed languages we demand ever more from our virtual machines: better performance, more scalability, and support for the latest new languages. Research and experimentation is essential but no longer practical in the context of mature, complex, production VMs written in multiple languages.
The Maxine VM is a next generation platform that establishes a new standard of productivity in this area of research. It is written entirely in Java, completely compatible with modern Java IDEs and the standard JDK, features a modular architecture that permits alternate implementations of subsystems such as GC and compilation to be plugged in, and is accompanied by a dedicated development tool (the Maxine Inspector) for debugging and visualizing nearly every aspect of the VM's runtime state.
Here's an excelent video demonstrating its memory monitoring utilities:
Introduction to the Maxine Inspector

I'm not aware of any that don't have a built-in GC; not much of a Java without one. Why not start with OpenJDK or Harmony?

maybe you don't need a JVM but a virtual machine would be sufficient for testing your algorithm. Unless, you are obliged to use a JVM, you can use APache Harmony or I would recommend another VM that was created on a phd thesis called VmKit. You can take a look at it and browse the source

Related

Why operating systems are not written in java?

All the operating systems till date have been written in C/C++ while there is none in Java. There are tonnes of Java applications but not an OS. Why?

Because we have operating systems already, mainly. Java isn't designed to run on bare metal, but that's not as big of a hurdle as it might seem at first. As C compilers provide intrinsic functions that compile to specific instructions, a Java compiler (or JIT, the distinction isn't meaningful in this context) could do the same thing. Handling the interaction of GC and the memory manager would be somewhat tricky also. But it could be done. The result is a kernel that's 95% Java and ready to run jars. What's next?
Now it's time to write an operating system. Device drivers, a filesystem, a network stack, all the other components that make it possible to do things with a computer. The Java standard library normally leans heavily on system calls to do the heavy lifting, both because it has to and because running a computer is a pain in the ass. Writing a file, for example, involves the following layers (at least, I'm not an OS guy so I've surely missed stuff):
The filesystem, which has to find space for the file, update its directory structure, handle journaling, and finally decide what disk blocks need to be written and in what order.
The block layer, which has to schedule concurrent writes and reads to maximize throughput while maximizing fairness.
The device driver, which has to keep the device happy and poke it in the right places to make things happen. And of course every device is broken in its own special way, requiring its own driver.
And all this has to work fine and remain performant with a dozen threads accessing the disk, because a disk is essentially an enormous pile of shared mutable state.
At the end, you've got Linux, except it doesn't work as well because it doesn't have near as much effort invested into functionality and performance, and it only runs Java. Possibly you gain performance from having a single address space and no kernel/userspace distinction, but the gain isn't worth the effort involved.
There is one place where a language-specific OS makes sense: VMs. Let the underlying OS handle the hard parts of running a computer, and the tenant OS handles turning a VM into an execution environment. BareMetal and MirageOS follow this model. Why would you bother doing this instead of using Docker? That's a good question.

Indeed there is a JavaOS http://en.wikipedia.org/wiki/JavaOS
And here is discuss about why there is not many OS written in java Is it possible to make an operating system using java?
In short, Java need to run on JVM. JVM need to run on an OS. writing an OS using Java is not a good choice.
OS needs to deal with hardware which is not doable using java (except using JNI). And that is because JVM only provided limited commands which can be used in Java. These command including add, call a method and so on. But deal with hardware need command to operate reg, memory, CPU, hardware drivers directly. These are not supported directly in JVM so JNI is needed. That is back to the start - it is still needed to write an OS using C/assembly.
Hope this helps.

One of the main benefits of using Java is that abstracts away a lot of low level details that you usually don't really need to care about. It's those details which are required when you build an OS. So while you could work around this to write an OS in Java, it would have a lot of limitations, and you'd spend a lot of time fighting with the language and its initial design principles.

For operating systems you need to work really low-level. And that is a pain in Java. You do need e.g. unsigned data types, and Java only has signed data types. You need struct objects that have exactly the memory alignment the driver expects (and no object header like Java adds to every object).
Even key components of Java itself are no longer written in Java.
And this is -by no means- a temporary thing. More and more does get rewritten in native code to get better performance. The HotSpot VM adds "intrinsics" for performance critical native code, and there is work underway to reduce the overall cost of native calls.
For example JavaFX: The reason why it is much faster than AWT/Swing ever were is because it contains/uses a huge amount of native code. It relies on native code for rendering, and e.g. if you add the "webview" browser component it is actually using the webkit C library to provide the browser.
There is a number of things Java does really well. It is a nicely structured language with a fantastic toolchain. Python is much more compact to write, but its toolchain is a mess, e.g. refactoring tools are disappointing. And where Java shines is at optimizing polymorphism at run-time. Where C++ compilers would need to do expensive virtual calls - because at compile time it is not known which implementation will be used - there Hotspot can aggressively inline code to get better performance. But for operating systems, you do not need this much. You can afford to manually optimize call sites and inlining.

This answer does not mean to be exhaustive in any way, but I'd like to share my thoughts on the (very vast) topic.
Although it is theoretically possible to write some OS in pure java, there are practical matters that make this task really difficult. The main problem is that there is no (currently up to date and reliable) java compiler able to compile java to byte code. So there is no existing tool to make writing a whole OS from the ground up feasible in java, at least as far as my knowledge goes.
Java was designed to run in some implementation of the java virtual machine. There exist implementations for Windows, Mac, Linux, Android, etc. The design of the language is strongly based on the assumption that the JVM exists and will do some magic for you at runtime (think garbage collection, JIT compiler, reflection, etc.). This is most likely part of the reason why such a compiler does not exist: where would all these functionality go? Compiled down to byte code? It's possible but at this point I believe it would be difficult to do. Even Android, whose SDK is purely java based, runs Dalvik (a version of the JVM that supports a subset of the language) on a Linux Kernel.

How to setup development environment for Java HotSpot VM?

What is the best way to understand the Java HotSpot VM? And if I want to make modifications to the source code and add my own features, what would be the best development environment (does ctags work well with the large code base, or do I need a full-blown IDE)?

I doubt that you would want to dive into the Hotspot code-base... I'm copying parts of my answer on from this question:
Which JVM to choose for GC hacking?
I think the Maxine Research VM from Oracle Labs would be a good starting point. Here's a quote from the first page of their wiki:
Project Overview
In this era of modern, managed languages we demand ever more from our virtual machines: better performance, more scalability, and support for the latest new languages. Research and experimentation is essential but no longer practical in the context of mature, complex, production VMs written in multiple languages.
The Maxine VM is a next generation platform that establishes a new standard of productivity in this area of research. It is written entirely in Java, completely compatible with modern Java IDEs and the standard JDK, features a modular architecture that permits alternate implementations of subsystems such as GC and compilation to be plugged in, and is accompanied by a dedicated development tool (the Maxine Inspector) for debugging and visualizing nearly every aspect of the VM's runtime state.
Here's an excelent video demonstrating its memory monitoring utilities:
Introduction to the Maxine Inspector

What are Scala's future platform concerns people should be prepared for?

At the moment Scala runs only on the JVM, with an outdated implementation for the CLR.
But there are some voices at the moment, that Microsoft is interested funding an up-to-date Scala port for .NET.
Considering the lack of any plan or oversight at Oracle's side what to do with Java/the JVM/the ecosystem, how can a Scala developer be prepared that in the end there might be no decent platform left to run Scala on?
Are there any plans to have some "independent" implementation of a Scala VM in the future, which maps Scala's feature to some bytecode/VM, instead of having to live with all these legacy bugs in current VM implementations (no generics, covariant arrays, weird annotations, no tail calls etc.)?

Here's another view regarding the VM:
While not really Sun's brightest moment if you look the whole picture, slapping the GPL license on JDK/related things has actually caused this wonderful situation where the whole JVM platform is completely independent from Oracle. I mean, the virtual machine isn't tied to Java, the garbage collectors aren't tied to Java and most importantly the Java programmers aren't really tied to Java and thus Oracle.
As a Java programmer, I'd say we won - if Oracle decides to deprecate everything in Java world in hopes of bigger profits, we can just grab the VM and a modern language such as Scala and let Larry Ellison sail to sunset in his yacht for all we care.

The current implementation of Scala is very much focused on the JVM. Much in the Scala library depends on classes in the Java standard library and Java classes are also exposed to user programs.
If there are going to be Scala implementations on other platforms such as the CLR or LLVM, then programs written for the current Java-oriented Scala implementation will not be automatically compatible with those other implementations (unless those implementations go to great lengths to support the classes available in Java).
I agree with Randall that the JVM is not going to disappear anytime soon; it's probably the most succesful and widespread virtual machine platform, deployed on billions of devices, from smartcards and handheld devices to the biggest servers. In fact, the Java programming language might disappear much sooner than the JVM itself. There is no reason to fear for the disappearance of the JVM in the forseeable future.
And even in the unlikely case that it does - does it really matter? You'd still be able to program in your favorite programming language Scala, on one of the other platforms.

I woudn't worry too much about the death of the JVM due to Oracle mismanagement, just as Esko said.
As of now, i do worry about the JVM in another way: The JVM was not constructed as a platform for multiple languages. Most languages running on the JVM use dynamic typing, and are in a way freed from the complexity of compiling to bytecode.
Scala is compiling to bytecode, and was constructed with the JVM in mind by the man (Odersky) who wrote the Java compiler (1.1-1.4). Scala is the only language written by someone with intimate knowledge of the JVM, and we do not really know how hard it was for him to do it.
I worry that the JVM eventually will dwindle in popularity due to the fact that it is not a multi-language platform to begin with.

Performance Cost of Profiling a Web-Application in Production

I am attempting to solve performance issues with a large and complex tomcat java web application. The biggest issue at the moment is that, from time to time, the memory usage spikes and the application becomes unresponsive. I've fixed everything I can fix with log profilers and Bayesian analysis of the log files. I'm considering running a profiler on the production tomcat server.
A Note to the Reader with Gentle Sensitivities:
I understand that some may find the very notion of profiling a production app offensive. Please be assured that I have exhausted most of the other options. The reason I am considering this is that I do not have the resources to completely duplicate our production setup on my test server, and I have been unable to cause the failures of interest on my test server.
Questions:
I am looking for answers which work either for a java web application running on tomcat, or answer this question in a language agnostic way.
What are the performance costs of profiling?
Any other reasons why it is a bad idea to remotely connect and profile a web application in production (strange failure modes, security issues, etc)?
How much does profiling effect the memory foot print?
Specifically are there java profiling tools that have very low performance costs?
Any java profiling tools designed for profiling web applications?
Does anyone have benchmarks on the performance costs of profiling with visualVM?
What size applications and datasets can visualVM scale to?

OProfile and its ancestor DPCI were developed for profiling production systems. The overhead for these is very low, and they profile your full system, including the kernel, so you can find performance problems in the VM and in the kernel and libraries.
To answer your questions:
Overhead: These are sampled profilers, that is, they generate timer or performance counter interrupts at some regular interval, and they take a look at what code is currently executing. They use that to build a histogram of where you spend your time, and the overhead is very low (1-8% is what they claim) for reasonable sampling intervals.
Take a look at this graph of sampling frequency vs. overhead for OProfile. You can tune the sampling frequency for lower overhead if the defaults are not to your liking.
Usage in production: The only caveat to using OProfile is that you'll need to install it on your production machine. I believe there's kernel support in Red Hat since RHEL3, and I'm pretty sure other distributions support it.
Memory: I'm not sure what the exact memory footprint of OProfile is, but I believe it keeps relatively small buffers around and dumps them to log files occasionally.
Java: OProfile includes profiling agents that support Java and that are aware of code running in JITs. So you'll be able to see Java calls, not just the C calls in the interpreter and JIT.
Web Apps: OProfile is a system-level profiler, so it's not aware of things like sessions, transactions, etc. that a web app would have.
That said, it is a full-system profiler, so if your performance problem is caused by bad interactions between the OS and the JIT, or if it's in some third-party library, you'll be able to see that, because OProfile profiles the kernel and libraries. This is an advantage for production systems, as you can catch problems that are due to misconfigurations or particulars of the production environment that might not exist in your test environment.
VisualVM: Not sure about this one, as I have no experience with VisualVM
Here's a tutorial on using OProfile to find performance bottlenecks.

I've used YourKit to profile apps in a high-load production environment, and while there was certainly an impact, it was easily an acceptable one. Yourkit makes a big deal of being able to do this in a non-invasive manner, such as selectively turning off certain profiling features that are more expensive (it's a sliding scale, really).
My favourite aspect of it is that you can run the VM with the YourKit agent running, and it has zero performance impact. it's only when you connect the GUI and start profiling that it has an effect.

There is nothing wrong in profiling production apps. If you work on distributed applications, there are times when a outofmemory exception occurs in a very unique probability scenario which is very difficult to reproduce in a dev/stage/uat environment.
You can try using custom profilers but if you are in a hurry and plugging in/ setting upa profiler on a production box will take time, you can also use the jvm to take a memory dump(jvms memory dump also gives you thread dump)
You can activate the automatic generation on the JVM command line, by using the following option :
-XX:+HeapDumpOnOutOfMemoryError
he Eclipse Memory Analyzer project has a very powerful feature called “group by value”, which makes it possible to build an object query and regroup the instances by a field value. This is useful in the case where you have a lot of instances that are containing a smaller set of possible values, and you can to see which values are being used the most. This has really helped me understand some complex memory dumps so I recommend you try it out.

You may also consider using one of the modern HotSpot JVM - Java Flight Recorder and Java Mission Control. It is a set of tools that allow you to collect low-level runtime information with the CPU overhead about 5% (I cannot prove the last statement anyhow, this is the statement of Oracle engineer who presented the feature and live demo).
You can use this tool as long as your application is running 1_7u40 JVM or higher. To enable the runtime info collection, you need to start JVM with particular flags:
By default, JFR is disabled in the JVM. To enable JFR, you must launch your Java application with the -XX:+FlightRecorder option. Because JFR is a commercial feature, available only in the commercial packages based on Java Platform, Standard Edition (Oracle Java SE Advanced and Oracle Java SE Suite), you also have to enable commercial features using the -XX:+UnlockCommercialFeatures options.
(Quoted http://docs.oracle.com/javase/8/docs/technotes/guides/jfr/about.html#sthref7)
I added this answer as this is viable option for profiling in production IMO.
Also there is an Eclipse plugin that supports JFR and JMC and capable of displaying information user-friendly.

The tools have improved vastly over the years. These days, most people who have needs like these use a tool that hooks into Java's instrumentation API instead of the profiling API. Surely there are more examples, but NewRelic and AppDynamics come to mind. Instrumentation-based solutions usually run as an agent in the JVM and constantly collect data. They report the data at a higher level (business transaction, web transaction, database transaction) than the old profiling approach and allow you to dig deeper (down to the method or line) if necessary. You can even setup monitoring and alerts, so you can track/alert on metrics like page load times and performance against SLAs. With these great tools, you really should have no reason to run a profiler in production any longer. The cost of running them is negligible.

What advantages have a commercial Java profiler over the free ones, e.g. the one in Netbeans?

Occasionally I have to do some profiling work on Java code, and I would like to know why I should have my boss investigate in a commercial profiler as opposed to just use the one in Netbeans or JConsole?
What would the killer features be that would warrant the investment?

In my experience with JProfiler, it's just an all-round slicker experience than the NetBeans profiler. It's easier to get started, easier to interpret the information and, although I haven't measured it, it seems that JProfiler has less of a negative impact on the performance of the application being profiled.
Also, JProfiler integrates nicely with IntelliJ IDEA. I have to use NetBeans to use the NetBeans profiler, which is an inconvenience because I have to manually configure a free-form project to match the layout of my project.
The NetBeans profiler is usable. Unlike IntelliJ, I wouldn't buy a JProfiler licence for my personal projects because, unlike an IDE, it's not a tool you use all day every day. However, for paid work there's no reason not to buy a better tool. It's not expensive compared to the cost of a developer's time.

I have experience using both NetBeans profiler and JProbe. For performance profiling I have found Netbeans quite useful but where JProbe is superior is for memory profiling.
JProbe has superior tools for comparing heap snapshots and finding the root cause of a memory leak. For example, in JProbe you can view heap shapshots visually as a graph, select nodes to investigate and then delete references to see if the instance could then be garbage collected.

If you are using Netbeans already then starting up the profiler is easy (unless you are using a Maven based project... sigh).
I have used paid profilers as well as the Netbeans one. Netbeans does the job well enough (it was a bit rough when it first came out... but much better now).
The code I profile isn't HUGE so I cannot say if the time spent in profiling is a major factor.
The answer is highly subjective and totally depends on your needs. Things to look at:
1) ease of use in your environment (in the case of NetBeans it is likely that the built in profiler is easiest.
2) time spent starting the prfiler to it actually getting you usable results
3) is it a sampling or tracing profiler? (An overview is here: http://docs.hp.com/en/5992-0757/ch05s01.html
4) can you view the results live or do you have to wait for the profiling to finish?
Here is a link to a slashdot discussion on Java profilers: http://ask.slashdot.org/article.pl?sid=06/06/30/0053237

I've not used Netbeans profiler, but tried JProfiler, Yourkit and JProbe. I found Yourkit slightly better (mainly bought by the usability aspect). Some of the useful features in it are: (you can check if it is available in Netbeans)
J2EE Profiling (Eg. It shows how much time an SQL query took).
Snapshot comparison and annotation
Deadlock detector
Exception telemetry
You can check for more details at their site.

I would say that, ready to use and more performance statistics. I was assigned a profiling job last year when I was interning at a multinational. I used the InfraRED profiler which uses Java aspect oriented API (works with both Aspectwerkz and AspectJ). But I had to extend the profiler to get what my manager wanted. Also, the performance statistics given by the profiler was limited.
But before selecting the profiler I researched a few other opensource profilers. Some of them were trivial and didnt suit what we wanted.
I would also add that, some of them just doesnt work. For example, if we want to collect performance statistics of a web application, all the profilers doesnt support those statistics required for us.

With a completely independent profiler, it's much easier to integrate it with other applications in your toolchain. For example, say you want to run the profiler as part of your build process (say, once a night). Something like JProfiler easily integrates with ANT, whereas profilers built into IDEs may or may not. If you have a separate build machine, installing a local copy of a profiler makes sense, but installing a whole IDE just to get access to one component does not.

If you are using Tomcat you might consider lambdaprobe
http://www.lambdaprobe.org/
(It is for free)

From my experience, YourKit profiler is most usable one. Small usability things really make the difference, but also it is most comprehensive one, containing:
most comprehensive and usable memory snapshots (working also with 1GB+ heaps), with detail object view and primitive data, for every single object. (for example in hashmap you can see if objects are evenly distributed or most are stored in same bucket!) This detail of memory snapshots and its ease of use is my main reason for yourkit.
very little overhead (far less then many other profilers I used)
comparing snapshots
J2EE profiling
deadlock detector, lock status (I think it still misses java.concurrent.locks, bud for synchronized it is great)
Among other things, it's also constantly improving, so who knows what is future holding :)

Compare the features and see if you really need the features provided by commercial software over the free one. If yes then its worth investing.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.