Is it worth mitigating against the effects of garbage collection?

Is it worth mitigating against the effects of garbage collection? - java

I have an application where the memory profile looks something like this:
(source: kupio.com)
The slow upwards crawl of memory usage is caused by the allocation of lots and lots of small, simple, transient objects. In low-memory situations (This is a mobile app) the GC overhead is noticeable when compared to less restrictive memory amounts.
Since we know, due to the nature of the app, that these spikes will just keep on coming, I was considering some sort of pool of multitudinous transient objects (Awesome name). These objects would live for the lifetime of the app and be re-used wherever possible (Where the lifetime of the object is short and highly predictable).
Hopefully this would mitigate against the effects of GC by reducing the number of objects collected and improve performance.
Obviously this would also have its own performance limits since "allocation" would be more expensive and there would be an overhead in maintaining the cache itself.
Since this would be a rather large and intrusive change into a large amount of code, I was wondering if anyone had tried something similar and if it was a benefit, or if there were any other known ways of mitigating against GC in this sort of situation. Ideas for efficient ways to manage a cache of re-usable objects are also welcome.

This is similar to the flyweight pattern detailed in the GoF patterns book (see edit below). Object pools have gone out of favour in a "normal" virtual machine due to the advances made in reducing the object creation, synchronization and GC overhead. However, these have certainly been around for a long time and it's certainly fine to try them to see if they help!
Certainly Object Pools are still in use for objects which have a very expensive creation overhead when compared with the pooling overheads mentioned above (database connections being one obvious example).
Only a test will tell you whether the pooling approach works for you on your target platforms!
EDIT - I took the OP "re-used wherever possible" to mean that the objects were immutable. Of course this might not be the case and the flyweight pattern is really about immutable objects being shared (Enums being one example of a flyweight). A mutable (read: unshareable) object is not a candidate for the flyweight pattern but is (of course) for an object pool.

Normally, I'd say this was a job for tuning the GC parameters of the VM, the reduce the spiky-ness, but for mobile apps that isn't really an option. So if the JVms you are using cannot have their GC behavioure modified, then old-fashioned object pooling may be the best solution.
The Apache Commons Pool library is good for that, although if this is a mobile app, then you may not want the library dependency overhead.

Actually, that graph looks pretty healthy to me. The GC is reclaiming lots of objects and the memory is then returning to the same base level. Empirically, this means that the GC is working efficiently.
The problem with object pooling is that it makes your app slower, more complicated and potentially more buggy. What is more, it can actually make each GC run take longer. (All of the "idle" objects in the pool are non-garbage and need to be marked, etc by the GC.)

Does J2ME have a generational garbage collector? If so it does many small, fast, collections and thus the pauses are reduced. You could try reducing the eden memory space (the small memory space) to increase the frequency and reduce the latency for collections and thus reduce the pauses.
Although, come to think of it, my guess is that you can't adjust gc behaviour because everything probably runs in the same VM (just a guess here).

You could check out this link describing enhancements to the Concurrent Mark Sweep collector, although I'm not sure it's available for J2ME. In particular note:
"The concurrent mark sweep collector, also known as the concurrent collector or CMS, is targeted at applications that are sensitive to garbage collection pauses."
... "In JDK 6, the CMS collector can optionally perform these collections concurrently, to avoid a lengthy pause in response to a System.gc() or Runtime.getRuntime().gc() call. To enable this feature, add the option"
-XX:+ExplicitGCInvokesConcurrent

Check out this link. In particular:
Just to list a few of the problems
object pools create: first, an unused
object takes up memory space for no
reason; the GC must process the unused
objects as well, detaining it on
useless objects for no reason; and in
order to fetch an object from the
object pool a synchronization is
usually required which is much slower
than the asynchronous allocation
available natively.

You're talking about a pool of reusable object instances.
class MyObjectPool {
List<MyObject> free= new LinkedList<MyObject>();
List<MyObject> inuse= new LinkedList<MyObject>();
public MyObjectPool(int poolsize) {
for( int i= 0; i != poolsize; ++i ) {
MyObject obj= new MyObject();
free.add( obj );
}
}
pubic makeNewObject( ) {
if( free.size() == 0 ) {
MyObject obj= new MyObject();
free.add( obj );
}
MyObject next= free.remove(0);
inuse.add( next );
return next;
}
public freeObject( MyObject obj ) {
inuse.remove( obj );
free.add( obj );
}
}
return in

Given that this answer suggests that there is not much scope for tweaking garbage collection itself in J2ME then if GC is an issue the only other option is to look at how you can change your application to improve performance/memory usage. Maybe some of the suggestions in the answer referenced would apply to your application.
As oxbow_lakes says, what you suggest is a standard design pattern. However, as with any optimisation the only way to really know how much it will improve your particular application is by implementing and profiling.

Related

Do Orphaned object in java lead to performance Issues [duplicate]

Should Java Objects be reused as often as it can be reused ? Or should we reuse it only when they are "heavyweight", ie have OS resources associated with it ?
All old articles on the internet talk about object reuse and object pooling as much as possible, but I have read recent articles that say new Object() is highly optimized now ( 10 instructions ) and Object reuse is not as big a deal as it used to be.
What is the current best practice and how are you people doing it ?

I let the garbage collector do that kind of deciding for me, the only time I've hit heap limit with freshly allocated objects was after running a buggy recursive algorithm for a couple of seconds which generated 3 * 27 * 27... new objects as fast as it could.
Do what's best for readability and encapsulation. Sometimes reusing objects may be useful, but generally you shouldn't worry about it.

If you use them very intensively and the construction is costly, you should try to reuse them as much as you can.
If your objects are very small, and cheap to create ( like Object ) you should create new ones.
For instance connections database are pooled because the cost of creating a new one is higher than those of creating .. mmhh new Integer for instance.
So the answer to your question is, reuse when they are heavy AND are used often ( it is not worth to pool a 3 mb object that is only used twice )
Edit:
Additionally, this item from Effective Java:Favor Immutability is worth reading and may apply to your situation.

Object creation is cheap, yes, but sometimes not cheap enough.
If you create a lot (and I mean A LOT) temporary objects in rapid succession, the costs for the garbage collector are considerable. However even with a good profiler you may not necessarily see the costs easily, as the garbage collector nowadays works in short intervals instead of blocking the whole application for a second or two.
Most of the performance improvements I got in my projects came from either avoiding object creation or avoiding the whole work (including the object creation) through aggressive caching. No matter how big or small the object is, it still takes time to create it and to manage the references and heap structures for it. (And of course, the cleanup and the internal heap-defrag/copying also takes time.)
I would not start to be religious about avoiding object creation at all cost, but if you see a jigsaw pattern in your memory-profiler, it means your garbage collector is on heavy duty. And if your garbage collector uses the CPU, the CPI is not available for your application.
Regarding object pooling: Doing it right and not running into either memory leaks or invalid states or spending more time on the management than you would save is difficult. So I never used that strategy.
My strategy has been to simply strive for immutable objects. Immutable things can be cached easily and therefore help to keep the system simple.
However, no matter what you do: Make sure you check your hotspots with a profiler first. Premature optimization is the root of most evilness.

Let the garbage collector do its job, it can be considered better than your code.
Unless a profiler proves it guilty. And don't even use common sense to try to figure out when it's wrong. In unusual cases even cheap objects like byte arrays are better pooled.
Rule 1 of optimization: don't do it.
Rule 2 (for experts only): don't do it yet.

The rule of thumb should be to use your common sense and reuse objects when their creation consumes significant resources such as I/O, network traffic, DB connections, etc...
If it's just creating a new String(), forget about the reuse, you'll gain nothing from it. Code readability has higher preference.

I would worry about performance issues if they arise. Do what makes sense first (would you do this with primatives), if you then run a profiling tool and find that it is new causing you problems, start to think about pre-allocation (ie. when your program isn't doing much work).
Re-using objects sounds like a disaster waiting to happen by the way:
SomeClass someObject = new SomeClass();
someObject.doSomething();
someObject.changeState();
someObject.changeOtherState();
someObject.sendSignal();
// stuff
//re-use
someObject.reset(); // urgh, had to put this in to support reuse
someObject.doSomethingElse(); // oh oh, this is wrong after calling changeOtherState, regardless of reset
someObject.changeState(); // crap, now this is wrong but it's not obvious yet
someObject.doImportantStuff(); // what's going on?

Object creation is certainly faster than it used to be. The newer generational GC in JDKs 5 and higher are improvements, too.
I don't think either of these makes excessive creation of objects cost-free, but they do reduce the importance of object pooling. I think pooling makes sense for database connections, but I don't attempt it for my own domain objects.
Reuse puts a premium on thread-safety. You need to think carefully to ensure that you can reuse objects safely.
If I decided that object reuse was important I'd do it with products like Terracotta, Tangersol, GridGain, etc. and make sure that my server had scads of memory available to it.

Second the above comments.
Don't try and second guess the GC and Hotspot. Object pooling may have been useful once but these days its not so useful unless you are talking about database connections or unique system resources.
Just try and write clean and simple code and be amazed at what Hotspot can do.
Why not use VisualVM or a profiler to take a look at your code?

How to keep track of the used memory in a Java application?

I have a java application that uses extensively the memory. It keeps a data-structure that grows very fast and is the responsible for the biggest amount of memory used.
In order to avoid an Out Of Memory, I decide to flush the data-structure to a repository (file or db) and post process it.
The problem that I face consists of choosing the time(when the used memory is "close" to reach the maximum allowed) to flush the data-structure into the repository. One way would be to keep track of the data-structure's memory usage on every update.
dataStructure.onUpdate(new CheckMemoryIfReachedMax() {
public void onUpdate(long usedMemory) {
if (usedMemory == MaxMemory) {
datastucture.flushInRepository();
}
}
}
The main problem in this case is that isnt easy to change the data-structure to keep track of the memory.
Another possible solution would be to get the used memory from the JVM and compare it to the maximum memory.
Runtime runtime = Runtime.getRuntime();
long freeMemory = runtime.freeMemory();
if (freeMemory < MaxUsedMemory) {
datastucture.flushInRepository();
}
In this case the problem is that the memory usage just gives a hint of how much memory is used, being that we cannot predict the moment the Garbage Collector removes the objects. This solution would make me flush more often the data-structure to the repository, so the application performance might suffer from this.
Is there any general pattern used in those cases? Do you have any suggestion about which of the solution would be better suited to the problem?

There is no good, universal definition of "memory used by JVM", the best choice is to track the size of the structure yourself, sorry.
The problem is - JVM uses memory both for garbage and actual data, and there is no way to tell one from the other until actual garbage collection occurs. This is by design and actually is a major optimization.
A dirty workaround would be to use JMX to track the amount of memory freed during the last collection (this will be the size of the short-lived garbage). This has two drawbacks:
you will get false positives once the long-lived garbage accumulates;
you will depend on a specific garbage collector and garbage collector settings (for example I have no idea if this scheme is achievable with G1).

I think you should use the first approach because comparing runtime memory is dependent on many things. Hence for using first approach and to get usedMemory by an object you can use
long getObjectSize(Object objectToSize)
from Instrumentation Interface to get an approximate size of your object.
Refer http://docs.oracle.com/javase/6/docs/api/java/lang/instrument/Instrumentation.html#getObjectSize(java.lang.Object)

Thread-local object pooling

Going through the Goetz "Java Concurrency in Practice" book, he makes a case against using object pooling (section 11.4.7) - main arguments:
1) allocation in Java is faster than C's malloc
2) threads requesting objects from a pool require costly synchronization
My problem is not so much that allocation is slow, but that periodic garbage collection introduces outliers in response time that could be eliminated by reducing object pools.
Are there any issues that I am not seeing in using this approach? Essentially I am partitioning an object pool across the threads...

If its thread local then you can forget about this:
2) threads requesting objects from a pool require costly synchronization
Being thread-local you need not worry about synchronization to retrieve from the pool itself.

(sun's) GC scans live objects. the assumption is that there are way more dead objects than live objects in a typical java program runtime. it marks live objects, and dispose the rest.
if you cache a lot of objects, they are all live. and if you have several GBs of such objects, GC is going to waste a lot of time scanning them in vain. long GC pauses can paralyze your application.
cache something just to make it non-garbage is not helping GC.
that's not to say caching is wrong. if you have 15G memory, and your database is 10G, why not cache everything in memory, so responses are lighting fast. note this is to cache something that would otherwise be slow to fetch.
to prevent GC from fruitlessly scanning the 10G cache, the cache must be outside GC's control. For example, use 'memcached" which lives in another process, and has its own cache-optimized GC.
the latest news is Terracotta's BigMemory which is a pure java solution that does similar thing.
an example of thread local pooling is sun's direct ByteBuffer pooling. when we call
channel.read(byteBuffer)
if byteBuffer is not "direct", a "direct" one must be allocated under the hood, used to communicate data with OS. in a network application, such allocations could be very frequent, it seems to be a waste, to discard a just allocated one, and immediately allocate another one in the next statement. sun's engineers, apparently don't trust GC that much, created a thread local pool of "direct" ByteBuffers.

In Java 1.4, object allocation was relatively expensive so Object pools for even simple objects could help. However, in Java 5.0, Object allocation was significantly improved, however synchronization still had a way to go meaning that object allocation was faster than synchronization. i.e. removing object pools improved performance in many cases. In Java 6, synchronization has improved to the point where an object pool can make a little difference to performance in simple cases.
Avoiding simple object pools is a good idea because it is simpler, not for performance reasons.
For more complex/larger objects, object pools can be useful in Java 6, even if you use synchronization. e.g. a Socket, File stream, or Database connection.

I think your case is reasonable situation to use pooling. There is no evil in pooling, Goetz means that you should not use it when it is not necessary. Another example is connection pooling, because creation of connection is very expensive.

If it is threadlocal, it's very likely you may not even need pooling. Of course it would depend on the use cases, but the chances are, on a given thread you will likely need only one object of that type at a given time.
The caveat with threadlocals, however, is memory management. Note that threadlocal values don't go away easily until the thread that owns those threadlocals go away. Therefore, if you have a large number of threads and a large number of threadlocals, they may contribute to used memory quite a bit.

I'd definitely try it out. Although is now "common knowledge" that one should not care about object creation, in fact there may be a lot of performance gained from using object pools and specific classes. For a file processing framework, I gained 5% read performance from pooling object[] objects.
So try it out and time your executions to see if you gain anything.

Even it's an old question, point of 2 threads requesting objects from a pool require costly synchronization does not completely hold true.
It's possible to write a concurrent (no synchronization) object pool that doesn't even exhibit sharing (even false sharing) on the fast path. In the simplistic case, of course, each thread might have its own pool (more like an associated object) but then such a greedy approach can lead to resource waste (or starvation/error if the resource cannot be allocated)
Pools are good for heavy objects like ByteBuffers, esp. direct ones, connections, sockets, threads, etc. Overall any objects that require non-java intervention.

Why do you not explicitly call finalize() or start the garbage collector?

After reading this question, I was reminded of when I was taught Java and told never to call finalize() or run the garbage collector because "it's a big black box that you never need to worry about". Can someone boil the reasoning for this down to a few sentences? I'm sure I could read a technical report from Sun on this matter, but I think a nice, short, simple answer would satisfy my curiosity.

The short answer: Java garbage collection is a very finely tuned tool. System.gc() is a sledge-hammer.
Java's heap is divided into different generations, each of which is collected using a different strategy. If you attach a profiler to a healthy app, you'll see that it very rarely has to run the most expensive kinds of collections because most objects are caught by the faster copying collector in the young generation.
Calling System.gc() directly, while technically not guaranteed to do anything, in practice will trigger an expensive, stop-the-world full heap collection. This is almost always the wrong thing to do. You think you're saving resources, but you're actually wasting them for no good reason, forcing Java to recheck all your live objects “just in case”.
If you are having problems with GC pauses during critical moments, you're better off configuring the JVM to use the concurrent mark/sweep collector, which was designed specifically to minimise time spent paused, than trying to take a sledgehammer to the problem and just breaking it further.
The Sun document you were thinking of is here: Java SE 6 HotSpot™ Virtual Machine Garbage Collection Tuning
(Another thing you might not know: implementing a finalize() method on your object makes garbage collection slower. Firstly, it will take two GC runs to collect the object: one to run finalize() and the next to ensure that the object wasn't resurrected during finalization. Secondly, objects with finalize() methods have to be treated as special cases by the GC because they have to be collected individually, they can't just be thrown away in bulk.)

Don't bother with finalizers.
Switch to incremental garbage collection.
If you want to help the garbage collector, null off references to objects you no longer need. Less path to follow= more explicitly garbage.
Don't forget that (non-static) inner class instances keep references to their parent class instance. So an inner class thread keeps a lot more baggage than you might expect.
In a very related vein, if you're using serialization, and you've serialized temporary objects, you're going to need to clear the serialization caches, by calling ObjectOutputStream.reset() or your process will leak memory and eventually die.
Downside is that non-transient objects are going to get re-serialized.
Serializing temporary result objects can be a bit more messy than you might think!
Consider using soft references. If you don't know what soft references are, have a read of the javadoc for java.lang.ref.SoftReference
Steer clear of Phantom references and Weak references unless you really get excitable.
Finally, if you really can't tolerate the GC use Realtime Java.
No, I'm not joking.
The reference implementation is free to download and Peter Dibbles book from SUN is really good reading.

As far as finalizers go:
They are virtually useless. They aren't guaranteed to be called in a timely fashion, or indeed, at all (if the GC never runs, neither will any finalizers). This means you generally shouldn't rely on them.
Finalizers are not guaranteed to be idempotent. The garbage collector takes great care to guarantee that it will never call finalize() more than once on the same object. With well-written objects, it won't matter, but with poorly written objects, calling finalize multiple times can cause problems (e.g. double release of a native resource ... crash).
Every object that has a finalize() method should also provide a close() (or similar) method. This is the function you should be calling. e.g., FileInputStream.close(). There's no reason to be calling finalize() when you have a more appropriate method that is intended to be called by you.

Assuming finalizers are similar to their .NET namesake then you only really need to call these when you have resources such as file handles that can leak. Most of the time your objects don't have these references so they don't need to be called.
It's bad to try to collect the garbage because it's not really your garbage. You have told the VM to allocate some memory when you created objects, and the garbage collector is hiding information about those objects. Internally the GC is performing optimisations on the memory allocations it makes. When you manually try to collect the garbage you have no knowledge about what the GC wants to hold onto and get rid of, you are just forcing it's hand. As a result you mess up internal calculations.
If you knew more about what the GC was holding internally then you might be able to make more informed decisions, but then you've missed the benefits of GC.

The real problem with closing OS handles in finalize is that the finalize are executed in no guaranteed order. But if you have handles to the things that block (think e.g. sockets) potentially your code can get into deadlock situation (not trivial at all).
So I'm for explicitly closing handles in a predictable orderly manner. Basically code for dealing with resources should follow the pattern:
SomeStream s = null;
...
try{
s = openStream();
....
s.io();
...
} finally {
if (s != null) {
s.close();
s = null;
}
}
It gets even more complicated if you write your own classes that work via JNI and open handles. You need to make sure handles are closed (released) and that it will happen only once. Frequently overlooked OS handle in Desktop J2SE is Graphics[2D]. Even BufferedImage.getGrpahics() can potentially return you the handle that points into a video driver (actually holding the resource on GPU). If you won't release it yourself and leave it garbage collector to do the work - you may find strange OutOfMemory and alike situation when you ran out of video card mapped bitmaps but still have plenty of memory. In my experience it happens rather frequently in tight loops working with graphics objects (extracting thumbnails, scaling, sharpening you name it).
Basically GC does not take care of programmers responsibility of correct resource management. It only takes care of memory and nothing else. The Stream.finalize calling close() IMHO would be better implemented throwing exception new RuntimeError("garbage collecting the stream that is still open"). It will save hours and days of debugging and cleaning code after the sloppy amateurs left the ends lose.
Happy coding.
Peace.

The GC does a lot of optimization on when to properly finalize things.
So unless you're familiar with how the GC actually works and how it tags generations, manually calling finalize or start GC'ing will probably hurt performance than help.

Avoid finalizers. There is no guarantee that they will be called in a timely fashion. It could take quite a long time before the Memory Management system (i.e., the garbage collector) decides to collect an object with a finalizer.
Many people use finalizers to do things like close socket connections or delete temporary files. By doing so you make your application behaviour unpredictable and tied to when the JVM is going to GC your object. This can lead to "out of memory" scenarios, not due to the Java Heap being exhausted, but rather due to the system running out of handles for a particular resource.
One other thing to keep in mind is that introducing the calls to System.gc() or such hammers may show good results in your environment, but they won't necessarily translate to other systems. Not everyone runs the same JVM, there are many, SUN, IBM J9, BEA JRockit, Harmony, OpenJDK, etc... This JVM all conform to the JCK (those that have been officially tested that is), but have a lot of freedom when it comes to making things fast. GC is one of those areas that everyone invests in heavily. Using a hammer will often times destroy that effort.

Does the Java VM move objects in memory, and if so - how?

Does the Java virtual machine ever move objects in memory, and if so, how does it handle updating references to the moved object?
I ask because I'm exploring an idea of storing objects in a distributed fashion (ie. across multiple servers), but I need the ability to move objects between servers for efficiency reasons. Objects need to be able to contain pointers to each-other, even to objects on remote servers. I'm trying to think of the best way to update references to moved objects.
My two ideas so far are:
Maintain a reference indirection somewhere that doesn't move for the lifetime of the object, which we update if the object moves. But - how are these indirections managed?
Keep a list of reverse-references with each object, so we know what has to be updated if the object is moved. Of course, this creates a performance overhead.
I'd be interested in feedback on these approaches, and any suggestions for alternative approaches.

In reference to the comment above about walking the heap.
Different GC's do it different ways.
Typically copying collectors when they walk the heap, they don't walk all of the objects in the heap. Rather they walk the LIVE objects in the heap. The implication is that if it's reachable from the "root" object, the object is live.
So, at this stage is has to touch all of the live objects anyway, as it copies them from the old heap to the new heap. Once the copy of the live objects is done, all that remains in the old heap are either objects already copied, or garbage. At that point the old heap can be discarded completely.
The two primary benefits of this kind of collector are that it compacts the heap during the copy phase, and that it only copies living objects. This is important to many systems because with this kind of collector, object allocation is dirt cheap, literally little more than incrementing a heap pointer. When GC happens, none of the "dead" objects are copied, so they don't slow the collector down. It also turns out in dynamic systems that there's a lot more little, temporary garbage, than there is long standing garbage.
Also, by walking the live object graph, you can see how the GC can "know" about every object, and keep track of them for any address adjustment purposes performed during the copy.
This is not the forum to talk deeply about GC mechanics, as it's a non-trivial problem, but that's the basics of how a copying collector works.
A generational copying GC will put "older" objects in different heaps, and those end up being collected less often than "newer" heaps. The theory is that the long lasting objects get promoted to older generations and get collected less and less, improving overall GC performance.

The keyword you're after is "compacting garbage collector". JVMs are permitted to use one, meaning that objects can be relocated. Consult your JVM's manual to find out whether yours does, and to see whether there are any command-line options which affect it.
The conceptually simplest way to explain compaction is to assume that the garbage collector freezes all threads, relocates the object, searches heap and stack for all references to that object, and updates them with the new address. Actually it's more complex than that, since for performance reasons you don't want to perform a full sweep with threads stalled, so an incremental garbage collector will do work in preparation for compaction whenever it can.
If you're interested in indirect references, you could start by researching weak and soft references in Java, and also the remote references used by various RPC systems.

I'd be curious to know more about your requirements. As another answer suggests, Terracotta may be exactly what you are looking for.
There is a subtle difference however between what Terracotta provides, and what you are asking for, thus my inquiry.
The difference is that as far as you are concerned, Terracotta does not provide "remote" references to objects - in fact the whole "remote" notion of RMI, JMS, etc. is entirely absent when using Terracotta.
Rather, in Terracotta, all objects reside in large virtual heap. Threads, whether on Node 1, or Node 2, Node 3, Node 4, etc all have access to any object in the virtual heap.
There's no special programming to learn, or special APIs, objects in the "virtual" heap have exactly the same behavior as objects in the local heap.
In short, what Terracotta provides is a programming model for multiple JVMs that operates exactly the same as a the programming model for a single JVM. Threads in separate nodes simply behave like threads in a single node - object mutations, synchronized, wait, notify all behave exactly the same across nodes as as across threads - there's no difference.
Furthermore, unlike any solution to come before it, object references are maintained across nodes - meaning you can use ==. It's all a part of maintaining the Java Memory Model across the cluster which is the fundamental requirement to make "regular" Java (e.g. POJOs, synchronized, wait/notify) work (none of that works if you don't / can't preserve object identity across the cluster).
So the question comes back to you to further refine your requiements - for what purpose do you need "remote" pointers?

(Practically) Any garbage collected system has to move objects around in memory to pack them more densely and avoid fragmentation problems.
What you are looking at is a very large and complex subject. I'd suggest you read up on existing remote object style API's: .NET remoting and going further back technologies like CORBA
Any solution for tracking the references will be complicated by having to deal with all the failure modes that exist in distributed systems. The JVM doesn't have to worry about suddenly finding it can't see half of its heap because a network switch glitched.
When you drill into the design I think a lot of it will come down to how you want to handle different failure cases.
Response to comments:
Your question talks about storing objects in a distributed fashion, which is exactly what .NET remoting and CORBA address. Admittedly neither technology supports migration of these objects (AFAIK). But they both deal extensively with the concepts of object identity which is a critical part of any distributed object system: how do different parts of the system know which objects they are talking about.
I am not overly familiar with the details of the Java garbage collector, and I'm sure the Java and .NET garbage collectors have a lot of complexity in them to achieve maximum performance with minimum impact on the application.
However, the basic idea for garbage collection is:
The VM stops all threads from running managed code
It performs a reachability analysis from the set of known 'roots': static variables, local variables on all the threads. For each object it finds it follows all references within the object.
Any object not identified by the reachability analysis is garbage.
Objects that are still alive can then be moved down in memory to pack them densely. This means that any references to these objects also have to be updated with the new address. By controlling when a garbage collect can occur the VM is able to guarantee that there are no object references 'in-the-air' (ie. being held in a machine register) that would cause a problem.
Once the process is complete the VM starts the threads executing again.
As a refinement of this process the VM can perform generational garbage collection, where separate heaps are maintained based on the 'age' of an object. Objects start in heap 0 and if they survive several GCs then the migrate to heap 1 and eventually to heap 2 (and so on - .NET supports 3 generations only though). The advantage of this is that the GC can run heap 0 collections very frequently, and not have to worry about doing the work to prove the long lived objects (which have ended up in heap 2) are still alive (which they almost certainly are).
There are other refinements to support concurrent garbage collection, and details around threads that are actually executing unmanaged code when the GC is scheduled that add a lot more complexity to this area.

sounds like you are looking for a distributed cache, something like terracotta or oracle's java objece cache (formerly tangersol).

If you are willing to go that deep down, you can take a look to JBoss Cache architecture docs and grab some of its source code as reference.
This is not exactly what you described, but it works very similar.
Here's the link.
http://www.jboss.org/jbosscache/
I hope this helps.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.