What's the difference between BufferUtils and MemoryUtil? (LWJGL)

What's the difference between BufferUtils and MemoryUtil? (LWJGL) - java

I always used MemoryUtil to store float buffers, but people seem to use BufferUtils for it:
private IntBuffer convertToIntBuffer(int[] data) {
IntBuffer buffer = BufferUtils.createIntBuffer(data.length);
return buffer.put(data).flip();
}
private FloatBuffer convertToFloatBuffer(float[] data) {
FloatBuffer buffer = MemoryUtil.memAllocFloat(data.length);
return buffer.put(data).flip();
}

LWJGL 3's org.lwjgl.BufferUtils class is only a small facade over Java's java.nio.ByteBuffer.allocateDirect() method, allowing you to use the JVM's memory allocator to allocate off-heap memory and return a NIO ByteBuffer (or typed view thereof), including making sure that the ByteOrder is nativeOrder().
The NIO Buffer allocated by ByteBuffer.allocateDirect() is managed by the JRE internally and the native memory is freed implicitly as part of the garbage collection cycle once it becomes unreachable.
There are a lot of downsides to allocating off-heap memory with this approach, such as (quote from "Memory management in LWJGL 3"):
-begin-of-quote-
It is slow, much slower than the raw malloc() call. A lot of overhead on top of a function that is already slow.
It scales badly under contention.
It arbitrarily limits the amount of allocated memory (-XX:MaxDirectMemorySize).
Like Java arrays, the allocated memory is always zeroed-out. This is not necessarily bad, but having the option would be better.
There's no way to deallocate the allocated memory on demand (without JDK-specific reflection hacks). Instead, a reference queue is used that usually requires two GC cycles to free the native memory. This quite often leads to OOM errors under pressure.
-end of quote-
LWJGL 3's org.lwjgl.system.MemoryUtil class on the other hand allows you to use other native/off-heap memory allocators instead of the JVM's ByteBuffer allocator to allocate off-heap native memory, including the option of just giving you the raw virtual memory address as a long, avoiding the NIO Buffer instance.
LWJGL supports the system allocator of the C standard library (malloc) as well as currently jemalloc and rpmalloc. All of these provide a very much faster alternative to Java's ByteBuffer.allocateDirect(), alleviating the above mentioned drawbacks.
Because the native memory is not managed by the JVM anymore, you have to free the memory yourself, for which there is the org.lwjgl.system.MemoryUtil.memFree() method.
Before you continue, you should however read the mentioned LWJGL blog post in its entirety, as there are more options, such as org.lwjgl.system.MemoryStack, for allocating native off-heap memory in particular situations (such as short-lived memory), which is even faster than all the other alternatives above mentioned.

Related

Use of Java non direct buffer

I did search a lot and checked multiple answers.. none of them is clear to me.
Java has ByteBuffer. It has 2 flavors. 1 is direct and the other one is non-direct.
The direct buffer is good for IO.
What is the need for non-direct bytebuffer when we have byte[]? When to use that?

non-direct ByteBuffers are stored in the heap and are backed by an underlying byte array. These are typically used when you need a buffer that is readable and writable by the Java application, but doesn't need the same level of performance as a direct ByteBuffer.
So why not always use direct ByteBuffer?
Garbage Collection: Non-direct ByteBuffers are subject to garbage collection, which can free up memory automatically when it is no longer needed. With direct ByteBuffers, you have to manually free the memory.
Concurrency: Direct ByteBuffers are not thread-safe and require explicit synchronization in order to be safely accessed by multiple threads, which can add complexity and overhead to your code.
Complexity: Direct ByteBuffers often require more manual handling and can involve working with native code, which can make them more complex and harder to work with than non-direct ByteBuffers.
Increased Latency: Direct ByteBuffers can have increased latency compared to non-direct ByteBuffers, as the memory is allocated outside of the Java heap and must be transferred to and from the native heap.
Performance Variation: Performance with direct ByteBuffers can vary depending on the underlying system and hardware, making it harder to predict and guarantee performance.

Do we have to free an allocated ByteBuffer manually?

I'm writing some stuff that uses ByteBuffers. In the docs of the API it says
There is no way to free a buffer explicitly (without JVM specific
reflection). Buffer objects are subject to GC and it usually takes two
GC cycles to free the off-heap memory after the buffer object becomes
unreachable.
However in a SO post's accepted answer I read
BigMemory uses the memory address space of the JVM process, via direct
ByteBuffers that are not subject to GC unlike other native Java
objects.
Now what should I do, shall I free the created buffer? Or do I misunderstand something in the docs or the answer?

It depends how you create the buffer, there are many possible use cases. Regular ByteBuffer.allocate() will be created on the heap and will be collected by the GC. Other options e.g. native memory might not.
Terracotta BigMemory is a type of native off-heap memory which is not governed by the JVM GC. If you allocate a buffer in this type of memory you have to clear it yourself.
It might be a good idea to clear the buffer even if it's allocated in the heap memory. GC will take care of collecting unused buffer it but this will take some time.

As the documentation of the BufferUtils in LWJGL also say: There is no way to explicitly free a ByteBuffer.
The ByteBuffer objects that are allocated with the standard mechanism (namely, by directly or indirectly calling ByteBuffer#allocateDirect) are subject to GC, and will be cleaned up eventually.
The answer that you linked to seems to refer to the BigMemory library in particular. Using JNI, you can create a (direct) ByteBffer that is not handled by the GC, and where it is up to you to actually free the underlying data.
However, a short advice: When dealing with LWJGL and other libraries that rely on (direct) ByteBuffer objects for the data transfer to the native side, you should think about the usage pattern of these buffers. Particularly for OpenGL binding libraries, you'll frequently need a ByteBuffer that only has space for 16 float values, for example (e.g. containing a matrix that is sent to OpenGL). And in many cases, the methods that do the data transfer with these buffers will be called frequently.
In such a case, it is usually not a good idea to allocate these small, short-lived buffers repeatedly:
class Renderer {
void renderMethodThatIsCalledThousandsOfTimesPerSecond() {
ByteBuffer bb = ByteBuffer.allocateDirect(16 * 4);
fill(bb);
passToOpenGL(bb);
}
}
The creation of these buffers and the GC can significantly reduce performance - and distressingly in the form of GC pauses, that could cause lags in a game.
For such cases, it can be beneficial to pull out the allocation, and re-use the buffer:
class Renderer {
private final ByteBuffer MATRIX_BUFFER_4x4 = ByteBuffer.allocateDirect(16 * 4);
void renderMethodThatIsCalledThousandsOfTimesPerSecond() {
fill(MATRIX_BUFFER_4x4);
passToOpenGL(MATRIX_BUFFER_4x4);
}
}

Is there a memory leak issue with using JNA's Memory class?

I am in a position where I want to pass byte[] to a native method via JNA. All the examples I've found about this sort of thing either use a Memory instance or use a directly-allocated ByteBuffer and then get a Pointer from that.
However, when I read the docs they say that underlying native memory -- which as I understand it is allocated "off the books" outside of the JVM-managed heap -- these Java objects consume only get freed when the objects' finalize() method is called.
But when that finalizer gets called has nothing to do with when the objects go out of scope. They could hang around for a long time before the garbage collector actually finalizes them. So any native memory they've allocated will stay allocated for an arbitrarily-long time after they go out of scope. If they are holding on to a lot of memory and/or if there are lots of the objects it would seem to me you have an effective memory leak. Or least will have a steady-state memory consumption potentially a lot higher than it would seem to need to be. In other words, similar behavior to what's described in JNA/ByteBuffer not getting freed and causing C heap to run out of memory
Is there any way around this problem with JNA? Or will I need to give up on JNA for this and use JNI instead so that I can use JNIEnv::GetByteArrayElements() so there's no need for any "off the books" memory allocations that can persist arbitrarily long? Is it acceptable to subclass Memory in order to get access to the dispose() method and use that to free up the underlying native memory on my timeline instead of the GC's timeline? Or will that cause problems when the finalizer does run?

JNA provides Memory.disposeAll() and Memory.dispose() to explicitly free memory (the latter requires you to subclass Memory), so if you do ever encounter memory pressure for which regular GC is not sufficient, you have some additional control available.

Comparisons between GC and two other memory management methods

I just want to understand more about current popular garbage collection, malloc / free and counter.
From my understanding, GC is the most popular because it relieves the burden of managing memory manually from the developers and also it is more bullet proof. malloc / free is easy to make mistake and cause memory leaks.
From http://ocaml.org/learn/tutorials/garbage_collection.html:
Why would garbage collection be faster than explicit memory allocation
as in C? It's often assumed that calling free costs nothing. In fact
free is an expensive operation which involves navigating over the
complex data structures used by the memory allocator. If your program
calls free intermittently, then all of that code and data needs to be
loaded into the cache, displacing your program code and data, each
time you free a single memory allocation. A collection strategy which
frees multiple memory areas in one go (such as either a pool allocator
or a GC) pays this penalty only once for multiple allocations (thus
the cost per allocation is much reduced).
Is it true that GC faster than malloc / free?
Also, what if the counter style memory management (objective-c is using it) joins the party?
I hope someone can summary the comparisons with deeper insights.

Is is true that GC faster than malloc / free?
It can be. It depends on the memory usage patterns. It also depends on how you measure "faster". (For example, are you measuring overall memory management efficiency, individual calls to malloc / free, or ... pause times.)
But conversely, malloc / free typically makes better use of memory than a modern copying GC ... provided that you don't run into heap fragmentation problems. And malloc / free "works" when the programming language doesn't provide enough information to allow a GC to distinguish heap pointers from other values.
Also, what if the counter style memory management (objective-c is using it) joins the party?
The overheads of reference counting make pointer assignment more expensive, and you have to somehow deal with reference cycles.
On the other hand, reference counting does offer a way to control memory management pauses ... which can be a significant issue for interactive games / apps. And memory usage is also better; see above.
FWIW, the points made in the source that you quoted are true. But it is not the whole picture.
The problem is that the whole picture is ... too complicated to be covered properly in a StackOverflow answer.

In case of Java there is no competition for any lock when the object is small enough to fit into the Thread Local Allocation Buffer.
TLAB.
This is an internal design and it has proven to work really good. From my understanding, allocating a new Object is just a pointer bump
TLAB Bump The Pointer
which is pretty fast.

Java nonblocking memory allocation

I read somewhere that java can allocate memory for objects in about 12 machine instructions. It's quite impressive for me. As far as I understand one of tricks JVM using is preallocating memory in chunks. This help to minimize number of requests to operating system, which is quite expensive, I guess. But even CAS operations can cost up to 150 cycles on modern processors.
So, could anyone explain real cost of memory allocation in java and which tricks JVM uses to speed up allocation?

The JVM pre-allocates an area of memory for each thread (TLA or Thread Local Area).
When a thread needs to allocate memory, it will use "Bump the pointer allocation" within that area. (If the "free pointer" points to adress 10, and the object to be allocated is size 50, then we just bump the free pointer to 60, and tell the thread that it can use the memory between 10 and 59 for the object).

The best trick is the generational garbage-collector. This keeps the heap unfragmented, so allocating memory is increasing the pointer to the free space and returning the old value. If memory runs out, the garbage-collection copy objects and creates this way a new unfragmented heap.
As different threads have to synchronize over the pointer to the free memory, if increasing it, they preallocate chunks. So a thread can allocate new memory, without the lock.
All of this is explained in more detail here: http://java.sun.com/javase/technologies/hotspot/gc/gc_tuning_6.html

There is no single memory allocator for the JVM. IIRC Sun's JVM and IBM's managed memory differently. However generally the way the JVM will operate is that it will initially allocate one piece of memory, this segment will be small enough to live in the processors cache making all access to this extremely fast.
As the application creates objects, the objects will take memory from within this segment. The object allocation within the segment is simply pointer arithmetic.
Initially the offset address into the freshly minted segment will be zero. The first object allocated will have an 'address' (actually an offset into the segment) of zero. When you allocate object then the memory manager will know how big the object is, allocate that much space within the segment (16 bytes say) and then increment it's "offset address" by that amount meaning that memory allocation is blindingly fast, it's just pointer arithmetic.
Sun have a whitepaper here Memory Management in the JavaHotSpot™ Virtual Machine and IBM used to have a bunch of stuff on ibm.com/developerworks

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.