Which VMs or GCs support JNI pinning? - java

Get<PrimitiveType>ArrayElements family of functions are documented to either copy arrays, or pin them in place (and, in so doing, prevent a compacting garbage collector from moving them). It is documented as a safer, less-restrictive alternative to GetPrimitiveArrayCritical. However, I'd like to know which VMs and/or garbage collectors (if any) actually pin arrays instead of copying them.

Older IBM JVMs pinned (1.4 and before - ie: NOT the current IBM J9 JVM) but since then, they have not. In general, JVMs don't like pinning as it really messes up copying garbage collectors, which is what most production JVMs do today. I'm not 100% up to date (ie: latest Java 7 builds), but historically HotSpot didn't either (for the same generational GC reasons).
Be aware: a JVM that pins today might not tomorrow, and vice versa, so you need to write your code to handle it both ways, just like the base Java libraries do.

Shenandoah supports pinning (although it is not clear if it does so when using Get*ArrayElements or only when Get*Critical): https://shipilev.net/jvm-anatomy-park/9-jni-critical-gclocker/

Related

Memory regions of Java program?

I haven't deep dive into how Java treats memory when a program is running as I have been in working at application level. I recently had one instance in which I needed to know owing to performance issues of application.
I have been aware of "stack" , "heap" regions of memory and I thought this is the model of a Java program. However, it turns out that it is much more, and beyond that.
For example, I came across terms like: Eden, s0, s1, Old memory and so on. I was never aware of these terminologies prior.
As Java is / have been changing and so may be these terminologies are/aren't relevant as of Java 8.
Can anyone guide where to get this information and under what circumstance we need to know them? Are these part of main memory that is RAM.
Eden, s0, s1, Old memory and other memory areas exist only in the context of the specific garbage collector implementation e.g. generational collectors like G1 will divide the heap into mentioned areas however non-generational collectors like ZGC will not.
Start by reviewing the main garbage collectors in the JVM:
ParNew
CMS
G1
ZGC / Shenandoah / Azul C4
and then try to understand related concepts:
Thread-local allocation buffers (TLAB)
Escape analysis
String constant pools, string interning, string de-duplication
Permanent generation vs Metaspace
Object layout e.g. why boolean is not taking 1 bit (word tearing)
Native memory e.g. JNI or off-heap memory access
I don't believe that there is a single website that will explain the full JVM memory management approach.
Java, as defined by the Java Language Specification and the Java Virtual Machine Specification talks about the stack and the heap (as well as the method area).
Those are the things that are needed to describe, conceptually, what makes a Java Virtual Machine.
If you wanted to implement a JVM you'd need to implement those in some way. They are just as valid in Java 13 as they were back in Java 1. Nothing has fundamentally changed about how those work.
The other terms you mentioned (as well as "old gen", "new gen", ...) are memory areas used in the implementation of specific garbage collection mechanisms, specifically those of implemented in the Oracle JDK / OpenJDK.
All of those areas are basically specific parts of the heap. The exact way the heap is split into those areas is up to the garbage collector to decide and knowing about them shouldn't be necessary unless you want to tweak your garbage collector.
Since garbage collectors change between releases and new garbage collector approaches are implemented regularly (as this is one of the primary ways to speed up JVMs), the concrete terms used here will change over the years.

Are there any JVMs that do not use generational garbage collection

I am giving a basic talk on garbage collection in Java, and the different algorithms used etc. My experience with GC has been only with the Hotspot JVM.
I was just wondering if there are any JVMs around that do not use a generational collection concept (i.e. Young, Old)? Just in case someone asks me this question!
Thanks.
There a lots of JVM implementations (see this page to have an idea). So yes, it is possible that some of them are not based on the Weak Generational Hypothesis. For instance, JVM such as JamaicaVM (hard real-time Java VM for embedded systems) could make other assumptions since they do not target the same applications than Oracle JVM does.
However, the most used implementations (Oracle JVM, IBM J9 and Azul Zing) are based on it.
Note that with G1 GC, Oracle JVM added a new type of collections : the generational-and-regional collections
Hope that helps !
Java 1.0 and 1.1 used mark-sweep collectors.
Reference: http://en.wikipedia.org/wiki/Java_performance#Garbage_collection
I also understand that modern JVMs will fallback to a mark-sweep-compact collector in extreme situations; e.g. when you have configured CMS and it can't keep up.
The IBM JVM used variants of mark-sweep-compact by default (-Xgcpolicy:throughput and -Xgcpolicy:optavgpause) until Java 7. See: description of policies.

java basics garbage collection

is garbage collection algorithm in java "vendor implemented?"
From the introduction paragraph to Chapter 3 of the Java Virtual Machine Specification:
For example, the memory layout of
run-time data areas, the
garbage-collection algorithm used, and
any internal optimization of the Java
virtual machine instructions (for
example, translating them into machine
code) are left to the discretion of
the implementor. [emphasis mine]
Yes, and not only that, each JVM can contain more than one garbage collection strategy:
Sun
JRockit
IBM
Definitely vendor dependent. GCJ and the Sun VM use totally different garbage collectors, for example.
Yes. The Java VM Spec's don't say anything specific about garbage collection. Each vendor has their own implementation for performing GC. In fact, each vendor will have multiple GC policies that can be best chosen for a particular task.
Example
A GC tuned for throughput may not be good for real-time systems since they will have erratic (and often longer) pause times which are not predictable. Non-predictability is a killer for real-time application.
Some GC's such as the ones from Oracle and IBM are very tunable and can be tune based on your application's run-time memory characteristics.
The internals of GC are not too complicated at a higher level. Many algorithms that began in the early days of LISP are still in use today.
Read this (http://nd.edu/~dthain/courses/cse40243/spring2006/gc-survey.pdf "GC Introduction") for a good introduction to Garbage Collection at a moderately high-level.
Yes. The Java VirtualMachine Specification don't say anything specific about garbage collection. Each vendor has their own implementation for performing the task.
each can automatically calls garbage collector, then we didn't need manual calls for garbage collection

RAM memory reallocation - Windows and Linux

I am working on a project involving optimizing energy consumption within a system. Part of that project consists in allocating RAM memory based on locality, that is allocating memory segments for a program as close as possible to each other. Is there a way I can know where exactly is the position of the memory I allocate (the memory chips) and I was also wondering if it is possible to force allocation in a deterministic manner. I am interested in both Windows and Linux. Also, the project will be implemented in Java and .NET so I am interested in managed APIs to achieve this.
[I am aware that this might not translate into direct energy consumption reduction but the project is supposed to be a proof of concept.]
You're working at the wrong level of abstraction.
Java (and presumably .NET) refers to objects using handles, rather than raw pointers. The underlying Java VM can move objects around in virtual memory at any time; the Java application doesn't see any difference.
Win32 and Linux applications (such as the Java VM) refer to memory using virtual addresses. There is a mapping from virtual address to a physical address on a RAM chip. The kernel can change this mapping at any time (e.g. if the data gets paged to disk then read back into a different memory location) and applications don't see any difference.
So if you're using Java and .NET, I wouldn't change your Java/.NET application to achieve this. Instead, I would change the underlying Linux kernel, or possibly the Java VM.
For a prototype, one approach might be to boot Linux with the mem= parameter to restrict the kernel's memory usage to less than the amount of memory you have, then look at whether you can mmap the spare memory (maybe by mapping /dev/mem as root?). You could then change all calls to malloc() in the Java VM to use your own special memory allocator, which allocates from that free space.
For a real implementation of this, you should do it by changing the kernel and keeping userspace compatibility. Look at the work that's been done on memory hotplug in Linux, e.g. http://lhms.sourceforge.net/
If you want to try this in a language with a big runtime you'd have to tweak the implementation of that runtime or write a DLL/shared object to do all the memory management for your sample application. At which point the overall system behaviour is unlikely to be much like the usual operation of those runtimes.
The simplest, cleanest test environment to detect the (probably small) advantages of locality of reference would be in C++ using custom allocators. This environment will remove several potential causes of noise in the runtime data (mainly the garbage collection). You will also lose any power overhead associated with starting the CLR/JVM or maintaining its operating state - which would presumably also be welcome in a project to minimise power consumption. You will naturally want to give the test app a processor core to itself to eliminate thread switching noise.
Writing a custom allocator to give you one of the preallocated chunks on your current page shouldn't be too tough, but given that to accomplish locality of reference in C/C++ you would ordinarily just use the stack it seems unlikely there will be one you can just find, download and use.
In C/C++, if you coerce a pointer to an int, this tells you the address. However, under Windows and Linux, this is a virtual address -- the operating system determines the mapping to physical memory, and the memory management unit in the processor carries it out.
So, if you care where your data is in physical memory, you'll have to ask the OS. If you just care if your data is in the same MMU block, then check the OS documentation to see what size blocks it's using (4KB is usual for x86, but I hear kids these days are playing around with 16M giant blocks?).
Java and .NET add a third layer to the mix, though I'm afraid I can't help you there.
Is pre-allocating in bigger chunks (than needed) an option at all? Will it defeat the original purpose?
I think that if you want such a tide control over memory allocation you are better of using a compiled language such as C, the JVM, isolated the actual implementation of the language from the hardware, chip selection for data storage included.
The approach requires specialized hardware. In ordinary memory sticks and slots arrangements are designed to dissipate heat as even per chip as possible. For example 1 bit in every bus word per physical chip.
This is an interesting topic although I think it is waaaaaaay beyond the capabilities of managed languages such as Java or .NET. One of the major principals of those languages is that you don't have to manage the memory and consequently they abstract that away for you. C/C++ gives you better control in terms of actually allocating that memory, but even in that case, as referenced previously, the operating system can do some hand waving and indirection with memory allocation making it difficult to determine how things are allocated together. Even then, you make reference to the actual chips, that's even harder and I would imagine would be hardware-dependent. I seriously would consider utilizing a prototyping board where you can code at the assembly level and actually control every memory unit allocation explicitly without any interference from compiler optimizations or operating system security practices. That would give you the most meaningful results as it would give you the ability to control every aspect of the program and determine, definitively that any power consumption improvements are due to your algorithm rather than some invisible optimization performed by the compiler or operating system. I imagine this is some sort of research project (very intriguing) so spending ~$100 on a prototyping board would definitely be worth it in my opinion.
In .NET there is a COM interface exposed for profiling .NET applications that can give you detailed address information. I think you will need to combine this with some calls to the OS to translate virtual addresses though.
As zztop eluded to, the .NET CLR compacts memory everytime a garbage collection is done. Although for large objects, they are not compacted. These are objects on the large object heap. The large object heap can consist of many segments scattered around from OS calls to VirtualAlloc.
Here are a couple links on the profiling APIs:
http://msdn.microsoft.com/en-us/magazine/cc300553.aspx
David Broman's CLR Profiling API Blog

Updating from Java 1.4.2 to Java 6 (both Sun VMs) results in slower performance

I've just upgraded some old Java source which has been running on a Sun Java 1.4.2 VM to Sun Java (JRE) 6 VM. More or less the only thing I had to change was to add explicit datatypes for some abstract objects (Hashmap's, Vector's and so on). The code itself it quite memory intensive, using up to 1G of heap memory (using -Xmx1024m as a parameter to start the VM).
Since I read alot about better performance on newer Java VM's, this was one of the reasons I did this upgrade.
Can anyone think of a reason why the performance is worse in my case now (just in general, of course, since you can't take a look at the code)?
Has anyone some advice for a non Java guru what to look for if I wanted to optimize (speed wise) existing code? Any hints, recommended docs, tools?
Thanks.
Not much information here. But here are a couple of things you might want to explore:
Start the VM with Xmx and Xms as the same value (in your case 1024M)
Ensure that the server jvm dll is being used to start the virtual machine.
Run a profiler to see what objects are hogging memory or what objects are not being garbage collected
Hook up your VM with the jconsole and trace through the objects
If your application nearly runs out of free space, garbage collection time may dominate computation time.
Enable gc debugging to look for this. Or, even better, simply start jconsole and attach it to your program.
Theoretically it could be that you application consumes more memory, because there were changes to the way Strings share their internal char[]. Less sharing is done after 1.4.
Check my old blog at http://www.sdn.sap.com/irj/scn/weblogs?blog=/pub/wlg/5100 (new blog is here)
I would compare the Garbage Collector logs to see whether memory usage is really the problem.
If that doesn't help, us a profiler such as Yourkit to find the differences.
Definitely use a profiler on the app (YourKit is great)...it's easy to waste a lot of time guessing at the problem when most of the time you'll be able to narrow it down really quickly in the profiler.

Categories

Resources