Known attempts at stack-based memory management for the JVM

Known attempts at stack-based memory management for the JVM - java

I'm reading up on different JVM implementations, and I'm wondering why a stack-based memory management isn't more widespread (not to be confused with escape analysis). Are any of you familiar with attempts on writing JVMs with stack-based memory management?

This is just not really practical.
As soon as you have multithreading, you have the need to share references to objects between threads. This means that threads need to hold references to other threads stacks, and these get invalidated as soon as the method that initially created the object returns.
The heap is effectively the shared area of memory that all threads in a process can see, so any object that needs to be seen by multiple threads naturally lives there.
Another way to say this is that the stack is private to a thread, whereas the heap is shared between them.

Related

Get memory usage of a thread

I know that I Java's Runtime object can report the JVM's memory usage. However, I need the memory usage for a certain thread. Any idea how to get this?
I appreciate your answer!

A Thread shares everything except its stack and the CPU cycles with all other threads in the VM. All objects created by the Thread are pooled with all the other objects.
The problem is to define what the memory usage of a Thread is. Is it only those objects it created? What if these objects subsequently are referenced by other threads? Do they only count half, then? What about objects created somewhere else, but are now referenced by this Thread?
I know of no tool trying to measure the memory consumption of separate Threads.

Is ThreadLocal allocated in TLAB?

I suppose, that ThreadLocal variables are allocated in Thread Local allocation Buffer(s) or TLABs, am I right ?
I was not successful in finding any document stating what exactly makes some class stored in TLAB. If you know some, please post a link.

I was not successfull to find any document stating what exactly makes some class stored in TLAB. If you know some, please post a link.
Actually, the explanation is right there in the blog post you lnked to:
A Thread Local Allocation Buffer (TLAB) is a region of Eden that is used for allocation by a single thread. It enables a thread to do object allocation using thread local top and limit pointers, which is faster than doing an atomic operation on a top pointer that is shared across threads.
Every thread allocates memory from its own chunk of Eden, the "Generation 0" part of the heap. Pretty much everything is stored in the TLAB for a period of time - quite possibly your ThreadLocals, too - but they get moved away from there after a gen0 garbage collection. TLABs are there to make allocations faster, not to make the memory unaccessible from other threads. A more accessible description from the same blog you linked to is A little thread privacy, please.

No. Here how it is:
As of 1.4 each thread in Java has a field called threadLocals where the map is kept. Each threadLocal has an index to the structure, so it doesn't use hashCode(). Imagine an array and each ThreadLocal keep a slot index.
When the thread dies and there are no more references to it, the ThreadLocals are GC'd. Very simple idea.
You can implement your own ThreaLocal(s) by extending Thread and adding a field to hold the reference. Then cast the Thread to youw own class and take the data.
So it's not TLAB, it's still the heap like any other object.
Historically there were implementations w/ static WeakHashMap which were very slow to access the data.

It is my understanding that TLAB is used for object allocation of all small to medium objects. Your ThreadLocal won't be allocated any differently.

I'm pretty sure that this is up to the discretion of the JVM implementer. They could put the data in TLABs if they wanted to, or in a global table keyed by the thread ID. The Java Language Specification tends to be mute about these sorts of issues so that JVM authors can deploy Java on as many and as diverse platforms as possible.

i think only the pointer to it is, while the data itself resides in some other memory area. see http://blogs.oracle.com/jonthecollector/entry/the_real_thing and http://wikis.sun.com/display/MaxineVM/Threads#Threads-Threadlocalvariables

Thread-local object pooling

Going through the Goetz "Java Concurrency in Practice" book, he makes a case against using object pooling (section 11.4.7) - main arguments:
1) allocation in Java is faster than C's malloc
2) threads requesting objects from a pool require costly synchronization
My problem is not so much that allocation is slow, but that periodic garbage collection introduces outliers in response time that could be eliminated by reducing object pools.
Are there any issues that I am not seeing in using this approach? Essentially I am partitioning an object pool across the threads...

If its thread local then you can forget about this:
2) threads requesting objects from a pool require costly synchronization
Being thread-local you need not worry about synchronization to retrieve from the pool itself.

(sun's) GC scans live objects. the assumption is that there are way more dead objects than live objects in a typical java program runtime. it marks live objects, and dispose the rest.
if you cache a lot of objects, they are all live. and if you have several GBs of such objects, GC is going to waste a lot of time scanning them in vain. long GC pauses can paralyze your application.
cache something just to make it non-garbage is not helping GC.
that's not to say caching is wrong. if you have 15G memory, and your database is 10G, why not cache everything in memory, so responses are lighting fast. note this is to cache something that would otherwise be slow to fetch.
to prevent GC from fruitlessly scanning the 10G cache, the cache must be outside GC's control. For example, use 'memcached" which lives in another process, and has its own cache-optimized GC.
the latest news is Terracotta's BigMemory which is a pure java solution that does similar thing.
an example of thread local pooling is sun's direct ByteBuffer pooling. when we call
channel.read(byteBuffer)
if byteBuffer is not "direct", a "direct" one must be allocated under the hood, used to communicate data with OS. in a network application, such allocations could be very frequent, it seems to be a waste, to discard a just allocated one, and immediately allocate another one in the next statement. sun's engineers, apparently don't trust GC that much, created a thread local pool of "direct" ByteBuffers.

In Java 1.4, object allocation was relatively expensive so Object pools for even simple objects could help. However, in Java 5.0, Object allocation was significantly improved, however synchronization still had a way to go meaning that object allocation was faster than synchronization. i.e. removing object pools improved performance in many cases. In Java 6, synchronization has improved to the point where an object pool can make a little difference to performance in simple cases.
Avoiding simple object pools is a good idea because it is simpler, not for performance reasons.
For more complex/larger objects, object pools can be useful in Java 6, even if you use synchronization. e.g. a Socket, File stream, or Database connection.

I think your case is reasonable situation to use pooling. There is no evil in pooling, Goetz means that you should not use it when it is not necessary. Another example is connection pooling, because creation of connection is very expensive.

If it is threadlocal, it's very likely you may not even need pooling. Of course it would depend on the use cases, but the chances are, on a given thread you will likely need only one object of that type at a given time.
The caveat with threadlocals, however, is memory management. Note that threadlocal values don't go away easily until the thread that owns those threadlocals go away. Therefore, if you have a large number of threads and a large number of threadlocals, they may contribute to used memory quite a bit.

I'd definitely try it out. Although is now "common knowledge" that one should not care about object creation, in fact there may be a lot of performance gained from using object pools and specific classes. For a file processing framework, I gained 5% read performance from pooling object[] objects.
So try it out and time your executions to see if you gain anything.

Even it's an old question, point of 2 threads requesting objects from a pool require costly synchronization does not completely hold true.
It's possible to write a concurrent (no synchronization) object pool that doesn't even exhibit sharing (even false sharing) on the fast path. In the simplistic case, of course, each thread might have its own pool (more like an associated object) but then such a greedy approach can lead to resource waste (or starvation/error if the resource cannot be allocated)
Pools are good for heavy objects like ByteBuffers, esp. direct ones, connections, sockets, threads, etc. Overall any objects that require non-java intervention.

Why do finalizers have a "severe performance penalty"?

Effective Java says :
There is a severe performance penalty for using finalizers.
Why is it slower to destroy an object using the finalizers?

Because of the way the garbage collector works. For performance, most Java GCs use a copying collector, where short-lived objects are allocated into an "eden" block of memory, and when the it's time for that generation of objects to be collected, the GC just needs to copy the objects that are still "alive" to a more permanent storage space, and then it can wipe (free) the entire "eden" memory block at once. This is efficient because most Java code will create many thousands of instances of objects (boxed primitives, temporary arrays, etc.) with lifetimes of only a few seconds.
When you have finalizers in the mix, though, the GC can't simply wipe an entire generation at once. Instead, it needs to figure out all the objects in that generation that need to be finalized, and queue them on a thread that actually executes the finalizers. In the meantime, the GC can't finish cleaning up the objects efficiently. So it either has to keep them alive longer than they should be, or it has to delay collecting other objects, or both. Plus you have the arbitrary wait time of actually executing the finalizers.
All these factors add up to a significant runtime penalty, which is why deterministic finalization (using a close() method or similar to explicitly finalize the object's state) is usually preferred.

Having actually run into one such problem:
In the Sun HotSpot JVM, finalizers are processed on a thread that is given a fixed, low priority. In a high-load application, it's easy to create finalization-required objects faster than the low-priority finalization thread can process them. Meanwhile, the space on the heap used by the finalization-pending objects is unavailable for other uses. Eventually, your application may spend all of its time garbage collecting, because all of the available memory is in use by objects pending finalization.
This is, of course, in addition to the other many reasons to not use finalizers that are described in Effective Java.

I just picked up my copy Effective Java off my desk to see what he's referring to.
If you read Chapter 2, Section 6, he goes into good detail about the various performance hits.
You can't know when the finalizer will run, or even if it will at all. Because those resources may never be claimed, you will have to run with fewer resources.
I would recommend reading the entirety of the section - it explains things much better than I can parrot here.

If you read the documentation of finalize() closely, you will notice that finalizers enable an object to prevent being collected by the GC.
If no finalizer is present, the object simply can be removed and does not need any more attention. But if there is a finalizer, it needs to be checked afterwards, if the object didn't become "visible" again.
Without knowing exactly how the current Java garbage collection is implemented (actually, because there are different Java implementations out there, there are also different GCs), you can assume that the GC has to do some additional work if an object has a finalizer, because of this feature.

My thought is this:
Java is a garbage collected language, which deallocates memory based on its own internal algorithms. Every so often, the GC scans the heap, determines which objects are no longer referenced, and de-allocates the memory.
A finalizer interrupts this and forces the deallocation of memory outside of the GC cycle, potentially causing inefficiencies.
I think best practices are to use finalizers only when ABSOLUTELY necessary such as freeing file handles or closing DB connections which should be done deterministically.

One reason I can think of is that explicit memory cleanup is unnecessary if your resources are all Java Objects, and not native code.

Does the Java VM move objects in memory, and if so - how?

Does the Java virtual machine ever move objects in memory, and if so, how does it handle updating references to the moved object?
I ask because I'm exploring an idea of storing objects in a distributed fashion (ie. across multiple servers), but I need the ability to move objects between servers for efficiency reasons. Objects need to be able to contain pointers to each-other, even to objects on remote servers. I'm trying to think of the best way to update references to moved objects.
My two ideas so far are:
Maintain a reference indirection somewhere that doesn't move for the lifetime of the object, which we update if the object moves. But - how are these indirections managed?
Keep a list of reverse-references with each object, so we know what has to be updated if the object is moved. Of course, this creates a performance overhead.
I'd be interested in feedback on these approaches, and any suggestions for alternative approaches.

In reference to the comment above about walking the heap.
Different GC's do it different ways.
Typically copying collectors when they walk the heap, they don't walk all of the objects in the heap. Rather they walk the LIVE objects in the heap. The implication is that if it's reachable from the "root" object, the object is live.
So, at this stage is has to touch all of the live objects anyway, as it copies them from the old heap to the new heap. Once the copy of the live objects is done, all that remains in the old heap are either objects already copied, or garbage. At that point the old heap can be discarded completely.
The two primary benefits of this kind of collector are that it compacts the heap during the copy phase, and that it only copies living objects. This is important to many systems because with this kind of collector, object allocation is dirt cheap, literally little more than incrementing a heap pointer. When GC happens, none of the "dead" objects are copied, so they don't slow the collector down. It also turns out in dynamic systems that there's a lot more little, temporary garbage, than there is long standing garbage.
Also, by walking the live object graph, you can see how the GC can "know" about every object, and keep track of them for any address adjustment purposes performed during the copy.
This is not the forum to talk deeply about GC mechanics, as it's a non-trivial problem, but that's the basics of how a copying collector works.
A generational copying GC will put "older" objects in different heaps, and those end up being collected less often than "newer" heaps. The theory is that the long lasting objects get promoted to older generations and get collected less and less, improving overall GC performance.

The keyword you're after is "compacting garbage collector". JVMs are permitted to use one, meaning that objects can be relocated. Consult your JVM's manual to find out whether yours does, and to see whether there are any command-line options which affect it.
The conceptually simplest way to explain compaction is to assume that the garbage collector freezes all threads, relocates the object, searches heap and stack for all references to that object, and updates them with the new address. Actually it's more complex than that, since for performance reasons you don't want to perform a full sweep with threads stalled, so an incremental garbage collector will do work in preparation for compaction whenever it can.
If you're interested in indirect references, you could start by researching weak and soft references in Java, and also the remote references used by various RPC systems.

I'd be curious to know more about your requirements. As another answer suggests, Terracotta may be exactly what you are looking for.
There is a subtle difference however between what Terracotta provides, and what you are asking for, thus my inquiry.
The difference is that as far as you are concerned, Terracotta does not provide "remote" references to objects - in fact the whole "remote" notion of RMI, JMS, etc. is entirely absent when using Terracotta.
Rather, in Terracotta, all objects reside in large virtual heap. Threads, whether on Node 1, or Node 2, Node 3, Node 4, etc all have access to any object in the virtual heap.
There's no special programming to learn, or special APIs, objects in the "virtual" heap have exactly the same behavior as objects in the local heap.
In short, what Terracotta provides is a programming model for multiple JVMs that operates exactly the same as a the programming model for a single JVM. Threads in separate nodes simply behave like threads in a single node - object mutations, synchronized, wait, notify all behave exactly the same across nodes as as across threads - there's no difference.
Furthermore, unlike any solution to come before it, object references are maintained across nodes - meaning you can use ==. It's all a part of maintaining the Java Memory Model across the cluster which is the fundamental requirement to make "regular" Java (e.g. POJOs, synchronized, wait/notify) work (none of that works if you don't / can't preserve object identity across the cluster).
So the question comes back to you to further refine your requiements - for what purpose do you need "remote" pointers?

(Practically) Any garbage collected system has to move objects around in memory to pack them more densely and avoid fragmentation problems.
What you are looking at is a very large and complex subject. I'd suggest you read up on existing remote object style API's: .NET remoting and going further back technologies like CORBA
Any solution for tracking the references will be complicated by having to deal with all the failure modes that exist in distributed systems. The JVM doesn't have to worry about suddenly finding it can't see half of its heap because a network switch glitched.
When you drill into the design I think a lot of it will come down to how you want to handle different failure cases.
Response to comments:
Your question talks about storing objects in a distributed fashion, which is exactly what .NET remoting and CORBA address. Admittedly neither technology supports migration of these objects (AFAIK). But they both deal extensively with the concepts of object identity which is a critical part of any distributed object system: how do different parts of the system know which objects they are talking about.
I am not overly familiar with the details of the Java garbage collector, and I'm sure the Java and .NET garbage collectors have a lot of complexity in them to achieve maximum performance with minimum impact on the application.
However, the basic idea for garbage collection is:
The VM stops all threads from running managed code
It performs a reachability analysis from the set of known 'roots': static variables, local variables on all the threads. For each object it finds it follows all references within the object.
Any object not identified by the reachability analysis is garbage.
Objects that are still alive can then be moved down in memory to pack them densely. This means that any references to these objects also have to be updated with the new address. By controlling when a garbage collect can occur the VM is able to guarantee that there are no object references 'in-the-air' (ie. being held in a machine register) that would cause a problem.
Once the process is complete the VM starts the threads executing again.
As a refinement of this process the VM can perform generational garbage collection, where separate heaps are maintained based on the 'age' of an object. Objects start in heap 0 and if they survive several GCs then the migrate to heap 1 and eventually to heap 2 (and so on - .NET supports 3 generations only though). The advantage of this is that the GC can run heap 0 collections very frequently, and not have to worry about doing the work to prove the long lived objects (which have ended up in heap 2) are still alive (which they almost certainly are).
There are other refinements to support concurrent garbage collection, and details around threads that are actually executing unmanaged code when the GC is scheduled that add a lot more complexity to this area.

sounds like you are looking for a distributed cache, something like terracotta or oracle's java objece cache (formerly tangersol).

If you are willing to go that deep down, you can take a look to JBoss Cache architecture docs and grab some of its source code as reference.
This is not exactly what you described, but it works very similar.
Here's the link.
http://www.jboss.org/jbosscache/
I hope this helps.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.