Thread-local object pooling

Thread-local object pooling - java

Going through the Goetz "Java Concurrency in Practice" book, he makes a case against using object pooling (section 11.4.7) - main arguments:
1) allocation in Java is faster than C's malloc
2) threads requesting objects from a pool require costly synchronization
My problem is not so much that allocation is slow, but that periodic garbage collection introduces outliers in response time that could be eliminated by reducing object pools.
Are there any issues that I am not seeing in using this approach? Essentially I am partitioning an object pool across the threads...

If its thread local then you can forget about this:
2) threads requesting objects from a pool require costly synchronization
Being thread-local you need not worry about synchronization to retrieve from the pool itself.

(sun's) GC scans live objects. the assumption is that there are way more dead objects than live objects in a typical java program runtime. it marks live objects, and dispose the rest.
if you cache a lot of objects, they are all live. and if you have several GBs of such objects, GC is going to waste a lot of time scanning them in vain. long GC pauses can paralyze your application.
cache something just to make it non-garbage is not helping GC.
that's not to say caching is wrong. if you have 15G memory, and your database is 10G, why not cache everything in memory, so responses are lighting fast. note this is to cache something that would otherwise be slow to fetch.
to prevent GC from fruitlessly scanning the 10G cache, the cache must be outside GC's control. For example, use 'memcached" which lives in another process, and has its own cache-optimized GC.
the latest news is Terracotta's BigMemory which is a pure java solution that does similar thing.
an example of thread local pooling is sun's direct ByteBuffer pooling. when we call
channel.read(byteBuffer)
if byteBuffer is not "direct", a "direct" one must be allocated under the hood, used to communicate data with OS. in a network application, such allocations could be very frequent, it seems to be a waste, to discard a just allocated one, and immediately allocate another one in the next statement. sun's engineers, apparently don't trust GC that much, created a thread local pool of "direct" ByteBuffers.

In Java 1.4, object allocation was relatively expensive so Object pools for even simple objects could help. However, in Java 5.0, Object allocation was significantly improved, however synchronization still had a way to go meaning that object allocation was faster than synchronization. i.e. removing object pools improved performance in many cases. In Java 6, synchronization has improved to the point where an object pool can make a little difference to performance in simple cases.
Avoiding simple object pools is a good idea because it is simpler, not for performance reasons.
For more complex/larger objects, object pools can be useful in Java 6, even if you use synchronization. e.g. a Socket, File stream, or Database connection.

I think your case is reasonable situation to use pooling. There is no evil in pooling, Goetz means that you should not use it when it is not necessary. Another example is connection pooling, because creation of connection is very expensive.

If it is threadlocal, it's very likely you may not even need pooling. Of course it would depend on the use cases, but the chances are, on a given thread you will likely need only one object of that type at a given time.
The caveat with threadlocals, however, is memory management. Note that threadlocal values don't go away easily until the thread that owns those threadlocals go away. Therefore, if you have a large number of threads and a large number of threadlocals, they may contribute to used memory quite a bit.

I'd definitely try it out. Although is now "common knowledge" that one should not care about object creation, in fact there may be a lot of performance gained from using object pools and specific classes. For a file processing framework, I gained 5% read performance from pooling object[] objects.
So try it out and time your executions to see if you gain anything.

Even it's an old question, point of 2 threads requesting objects from a pool require costly synchronization does not completely hold true.
It's possible to write a concurrent (no synchronization) object pool that doesn't even exhibit sharing (even false sharing) on the fast path. In the simplistic case, of course, each thread might have its own pool (more like an associated object) but then such a greedy approach can lead to resource waste (or starvation/error if the resource cannot be allocated)
Pools are good for heavy objects like ByteBuffers, esp. direct ones, connections, sockets, threads, etc. Overall any objects that require non-java intervention.

Related

Do Orphaned object in java lead to performance Issues [duplicate]

Should Java Objects be reused as often as it can be reused ? Or should we reuse it only when they are "heavyweight", ie have OS resources associated with it ?
All old articles on the internet talk about object reuse and object pooling as much as possible, but I have read recent articles that say new Object() is highly optimized now ( 10 instructions ) and Object reuse is not as big a deal as it used to be.
What is the current best practice and how are you people doing it ?

I let the garbage collector do that kind of deciding for me, the only time I've hit heap limit with freshly allocated objects was after running a buggy recursive algorithm for a couple of seconds which generated 3 * 27 * 27... new objects as fast as it could.
Do what's best for readability and encapsulation. Sometimes reusing objects may be useful, but generally you shouldn't worry about it.

If you use them very intensively and the construction is costly, you should try to reuse them as much as you can.
If your objects are very small, and cheap to create ( like Object ) you should create new ones.
For instance connections database are pooled because the cost of creating a new one is higher than those of creating .. mmhh new Integer for instance.
So the answer to your question is, reuse when they are heavy AND are used often ( it is not worth to pool a 3 mb object that is only used twice )
Edit:
Additionally, this item from Effective Java:Favor Immutability is worth reading and may apply to your situation.

Object creation is cheap, yes, but sometimes not cheap enough.
If you create a lot (and I mean A LOT) temporary objects in rapid succession, the costs for the garbage collector are considerable. However even with a good profiler you may not necessarily see the costs easily, as the garbage collector nowadays works in short intervals instead of blocking the whole application for a second or two.
Most of the performance improvements I got in my projects came from either avoiding object creation or avoiding the whole work (including the object creation) through aggressive caching. No matter how big or small the object is, it still takes time to create it and to manage the references and heap structures for it. (And of course, the cleanup and the internal heap-defrag/copying also takes time.)
I would not start to be religious about avoiding object creation at all cost, but if you see a jigsaw pattern in your memory-profiler, it means your garbage collector is on heavy duty. And if your garbage collector uses the CPU, the CPI is not available for your application.
Regarding object pooling: Doing it right and not running into either memory leaks or invalid states or spending more time on the management than you would save is difficult. So I never used that strategy.
My strategy has been to simply strive for immutable objects. Immutable things can be cached easily and therefore help to keep the system simple.
However, no matter what you do: Make sure you check your hotspots with a profiler first. Premature optimization is the root of most evilness.

Let the garbage collector do its job, it can be considered better than your code.
Unless a profiler proves it guilty. And don't even use common sense to try to figure out when it's wrong. In unusual cases even cheap objects like byte arrays are better pooled.
Rule 1 of optimization: don't do it.
Rule 2 (for experts only): don't do it yet.

The rule of thumb should be to use your common sense and reuse objects when their creation consumes significant resources such as I/O, network traffic, DB connections, etc...
If it's just creating a new String(), forget about the reuse, you'll gain nothing from it. Code readability has higher preference.

I would worry about performance issues if they arise. Do what makes sense first (would you do this with primatives), if you then run a profiling tool and find that it is new causing you problems, start to think about pre-allocation (ie. when your program isn't doing much work).
Re-using objects sounds like a disaster waiting to happen by the way:
SomeClass someObject = new SomeClass();
someObject.doSomething();
someObject.changeState();
someObject.changeOtherState();
someObject.sendSignal();
// stuff
//re-use
someObject.reset(); // urgh, had to put this in to support reuse
someObject.doSomethingElse(); // oh oh, this is wrong after calling changeOtherState, regardless of reset
someObject.changeState(); // crap, now this is wrong but it's not obvious yet
someObject.doImportantStuff(); // what's going on?

Object creation is certainly faster than it used to be. The newer generational GC in JDKs 5 and higher are improvements, too.
I don't think either of these makes excessive creation of objects cost-free, but they do reduce the importance of object pooling. I think pooling makes sense for database connections, but I don't attempt it for my own domain objects.
Reuse puts a premium on thread-safety. You need to think carefully to ensure that you can reuse objects safely.
If I decided that object reuse was important I'd do it with products like Terracotta, Tangersol, GridGain, etc. and make sure that my server had scads of memory available to it.

Second the above comments.
Don't try and second guess the GC and Hotspot. Object pooling may have been useful once but these days its not so useful unless you are talking about database connections or unique system resources.
Just try and write clean and simple code and be amazed at what Hotspot can do.
Why not use VisualVM or a profiler to take a look at your code?

In Java, how to improve performance of object creation?

Object creation is a bottleneck in my application.
I think that adding more threads for object creation makes the situation worse, because object creation is a CPU-bound task, right?
Then, how to improve performance?

Often the problem is not object creation itself, but repeated object creation and garbage generation. That causes two performance hits: creating all those objects and extra garbage collection stalls.
First, you should use profiling tools to verify that excessive object creation is the source of your performance problems. Assuming that you have verified that this is the problem, there are various things to look for and strategies to try. It all depends on how your code is written, so there's no one recommendation that will work. This list of Java performance guidelines from IBM is definitely worth applying. It identifies how to avoid many of the most common sins: don't create objects inside loops; use StringBuilder instead of a series of string concatenation expressions; use primitive types and avoid auto-boxing/unboxing where possible; cache frequently used objects; allocate collection classes with an explicit capacity instead of allowing them to grow; etc.
Another nice resource is Chapter 4 of the book Java Performance Tuning. (You can read it on-line here.)
If you search the web for excessive object creation java, you can find lots of other recommendations.

You can still get significant performance improvement by multi-threading CPU bound tasks when your app is running on a machine with multiple processors.

As #Pst says - are you sure it's the bottleneck? because these days it's not a common one.
But given that. One thing you could try is avoiding creation by caching and reusing instances. But that totally depends on what your program does.

Java uses a TLAB (Thread Local Allocation Buffer) for small to medium sizes objects. This means each thread can allocate objects concurrently. i.e. you don't get a slow down for using multiple threads.
In general, more CPUs improve CPU-bound problems. Its IO bound tasks where one cpu can use all the available bandwidth, like disk access, which are no faster when you use multiple CPUs.
The simplest way to reduce the cost of Object Creation is to create/discard less objects. There is a common assumption that object creation is unavoidable, but the last 2.5 years I have worked on applications which GC less than once per day, even under production load.
Most application don't work this way because they don't need to. However, if you have a need to minimise object creation you can.

is memory leak? why java.lang.ref.Finalizer eat so much memory

I ran a heap dump on my program. When I opened it in the memory analyzer tool, I found that the java.lang.ref.Finalizer for org.logicalcobwebs.proxool.ProxyStatement was taking up a lot of memory. Why is this so?

Some classes implement the Object.finalize() method. Objects which override this method need to called by a background thread call finalizer, and they can't be cleaned up until this happens. If these tasks are short and you don't discard many of these it all works well. However if you are creating lots of these objects and/or their finalizers take a long time, the queue of objects to be finalized builds up. It is possible for this queue to use up all the memory.
The solution is
don't use finalize()d objects if you can (if you are writing the class for the object)
make finalize very short (if you have to use it)
don't discard such objects every time (try to re-use them)
The last option is likely to be best for you as you are using an existing library.

From what I can make out, Proxool is a connection pool for JDBC connections. This suggests to me that the problem is that your application is misusing the connection pool. Instead of calling close on the statement objects, your code is probably dropping them and/or their parent connections. The Proxool is relying on finalizers to close the underlying driver-implemented objects ... but this requires those Finalizer instances. It could also mean that you are causing the connection to open / close (real) database connections more frequently than is necessary, and that would be bad for performance.
So I suggest that you check your code for leaked ResultSet, Statement and/or Connection objects, and make sure that you close them in finally blocks.
Looking at the memory dump, I expect you are concerned where the 898,527,228 bytes are going. The vast majority are retained by the Finalizer object whose id is 2aab07855e38. If you still have the dump file, take a look at what that Finalizer is referring to. It looks more problematic than the Proxool objects.

It may be late, But I had a similar issue and figured out that we need to tune up the garbage collectors, Can't keep serial and parallel GC, and G1 GC was also not working properly.
But when using concurrentMarkSweep GC we were able to stop building this queue too large.

Why do finalizers have a "severe performance penalty"?

Effective Java says :
There is a severe performance penalty for using finalizers.
Why is it slower to destroy an object using the finalizers?

Because of the way the garbage collector works. For performance, most Java GCs use a copying collector, where short-lived objects are allocated into an "eden" block of memory, and when the it's time for that generation of objects to be collected, the GC just needs to copy the objects that are still "alive" to a more permanent storage space, and then it can wipe (free) the entire "eden" memory block at once. This is efficient because most Java code will create many thousands of instances of objects (boxed primitives, temporary arrays, etc.) with lifetimes of only a few seconds.
When you have finalizers in the mix, though, the GC can't simply wipe an entire generation at once. Instead, it needs to figure out all the objects in that generation that need to be finalized, and queue them on a thread that actually executes the finalizers. In the meantime, the GC can't finish cleaning up the objects efficiently. So it either has to keep them alive longer than they should be, or it has to delay collecting other objects, or both. Plus you have the arbitrary wait time of actually executing the finalizers.
All these factors add up to a significant runtime penalty, which is why deterministic finalization (using a close() method or similar to explicitly finalize the object's state) is usually preferred.

Having actually run into one such problem:
In the Sun HotSpot JVM, finalizers are processed on a thread that is given a fixed, low priority. In a high-load application, it's easy to create finalization-required objects faster than the low-priority finalization thread can process them. Meanwhile, the space on the heap used by the finalization-pending objects is unavailable for other uses. Eventually, your application may spend all of its time garbage collecting, because all of the available memory is in use by objects pending finalization.
This is, of course, in addition to the other many reasons to not use finalizers that are described in Effective Java.

I just picked up my copy Effective Java off my desk to see what he's referring to.
If you read Chapter 2, Section 6, he goes into good detail about the various performance hits.
You can't know when the finalizer will run, or even if it will at all. Because those resources may never be claimed, you will have to run with fewer resources.
I would recommend reading the entirety of the section - it explains things much better than I can parrot here.

If you read the documentation of finalize() closely, you will notice that finalizers enable an object to prevent being collected by the GC.
If no finalizer is present, the object simply can be removed and does not need any more attention. But if there is a finalizer, it needs to be checked afterwards, if the object didn't become "visible" again.
Without knowing exactly how the current Java garbage collection is implemented (actually, because there are different Java implementations out there, there are also different GCs), you can assume that the GC has to do some additional work if an object has a finalizer, because of this feature.

My thought is this:
Java is a garbage collected language, which deallocates memory based on its own internal algorithms. Every so often, the GC scans the heap, determines which objects are no longer referenced, and de-allocates the memory.
A finalizer interrupts this and forces the deallocation of memory outside of the GC cycle, potentially causing inefficiencies.
I think best practices are to use finalizers only when ABSOLUTELY necessary such as freeing file handles or closing DB connections which should be done deterministically.

One reason I can think of is that explicit memory cleanup is unnecessary if your resources are all Java Objects, and not native code.

Is it worth mitigating against the effects of garbage collection?

I have an application where the memory profile looks something like this:
(source: kupio.com)
The slow upwards crawl of memory usage is caused by the allocation of lots and lots of small, simple, transient objects. In low-memory situations (This is a mobile app) the GC overhead is noticeable when compared to less restrictive memory amounts.
Since we know, due to the nature of the app, that these spikes will just keep on coming, I was considering some sort of pool of multitudinous transient objects (Awesome name). These objects would live for the lifetime of the app and be re-used wherever possible (Where the lifetime of the object is short and highly predictable).
Hopefully this would mitigate against the effects of GC by reducing the number of objects collected and improve performance.
Obviously this would also have its own performance limits since "allocation" would be more expensive and there would be an overhead in maintaining the cache itself.
Since this would be a rather large and intrusive change into a large amount of code, I was wondering if anyone had tried something similar and if it was a benefit, or if there were any other known ways of mitigating against GC in this sort of situation. Ideas for efficient ways to manage a cache of re-usable objects are also welcome.

This is similar to the flyweight pattern detailed in the GoF patterns book (see edit below). Object pools have gone out of favour in a "normal" virtual machine due to the advances made in reducing the object creation, synchronization and GC overhead. However, these have certainly been around for a long time and it's certainly fine to try them to see if they help!
Certainly Object Pools are still in use for objects which have a very expensive creation overhead when compared with the pooling overheads mentioned above (database connections being one obvious example).
Only a test will tell you whether the pooling approach works for you on your target platforms!
EDIT - I took the OP "re-used wherever possible" to mean that the objects were immutable. Of course this might not be the case and the flyweight pattern is really about immutable objects being shared (Enums being one example of a flyweight). A mutable (read: unshareable) object is not a candidate for the flyweight pattern but is (of course) for an object pool.

Normally, I'd say this was a job for tuning the GC parameters of the VM, the reduce the spiky-ness, but for mobile apps that isn't really an option. So if the JVms you are using cannot have their GC behavioure modified, then old-fashioned object pooling may be the best solution.
The Apache Commons Pool library is good for that, although if this is a mobile app, then you may not want the library dependency overhead.

Actually, that graph looks pretty healthy to me. The GC is reclaiming lots of objects and the memory is then returning to the same base level. Empirically, this means that the GC is working efficiently.
The problem with object pooling is that it makes your app slower, more complicated and potentially more buggy. What is more, it can actually make each GC run take longer. (All of the "idle" objects in the pool are non-garbage and need to be marked, etc by the GC.)

Does J2ME have a generational garbage collector? If so it does many small, fast, collections and thus the pauses are reduced. You could try reducing the eden memory space (the small memory space) to increase the frequency and reduce the latency for collections and thus reduce the pauses.
Although, come to think of it, my guess is that you can't adjust gc behaviour because everything probably runs in the same VM (just a guess here).

You could check out this link describing enhancements to the Concurrent Mark Sweep collector, although I'm not sure it's available for J2ME. In particular note:
"The concurrent mark sweep collector, also known as the concurrent collector or CMS, is targeted at applications that are sensitive to garbage collection pauses."
... "In JDK 6, the CMS collector can optionally perform these collections concurrently, to avoid a lengthy pause in response to a System.gc() or Runtime.getRuntime().gc() call. To enable this feature, add the option"
-XX:+ExplicitGCInvokesConcurrent

Check out this link. In particular:
Just to list a few of the problems
object pools create: first, an unused
object takes up memory space for no
reason; the GC must process the unused
objects as well, detaining it on
useless objects for no reason; and in
order to fetch an object from the
object pool a synchronization is
usually required which is much slower
than the asynchronous allocation
available natively.

You're talking about a pool of reusable object instances.
class MyObjectPool {
List<MyObject> free= new LinkedList<MyObject>();
List<MyObject> inuse= new LinkedList<MyObject>();
public MyObjectPool(int poolsize) {
for( int i= 0; i != poolsize; ++i ) {
MyObject obj= new MyObject();
free.add( obj );
}
}
pubic makeNewObject( ) {
if( free.size() == 0 ) {
MyObject obj= new MyObject();
free.add( obj );
}
MyObject next= free.remove(0);
inuse.add( next );
return next;
}
public freeObject( MyObject obj ) {
inuse.remove( obj );
free.add( obj );
}
}
return in

Given that this answer suggests that there is not much scope for tweaking garbage collection itself in J2ME then if GC is an issue the only other option is to look at how you can change your application to improve performance/memory usage. Maybe some of the suggestions in the answer referenced would apply to your application.
As oxbow_lakes says, what you suggest is a standard design pattern. However, as with any optimisation the only way to really know how much it will improve your particular application is by implementing and profiling.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.