I did search a lot and checked multiple answers.. none of them is clear to me.
Java has ByteBuffer. It has 2 flavors. 1 is direct and the other one is non-direct.
The direct buffer is good for IO.
What is the need for non-direct bytebuffer when we have byte[]? When to use that?
non-direct ByteBuffers are stored in the heap and are backed by an underlying byte array. These are typically used when you need a buffer that is readable and writable by the Java application, but doesn't need the same level of performance as a direct ByteBuffer.
So why not always use direct ByteBuffer?
Garbage Collection: Non-direct ByteBuffers are subject to garbage collection, which can free up memory automatically when it is no longer needed. With direct ByteBuffers, you have to manually free the memory.
Concurrency: Direct ByteBuffers are not thread-safe and require explicit synchronization in order to be safely accessed by multiple threads, which can add complexity and overhead to your code.
Complexity: Direct ByteBuffers often require more manual handling and can involve working with native code, which can make them more complex and harder to work with than non-direct ByteBuffers.
Increased Latency: Direct ByteBuffers can have increased latency compared to non-direct ByteBuffers, as the memory is allocated outside of the Java heap and must be transferred to and from the native heap.
Performance Variation: Performance with direct ByteBuffers can vary depending on the underlying system and hardware, making it harder to predict and guarantee performance.
Related
I always used MemoryUtil to store float buffers, but people seem to use BufferUtils for it:
private IntBuffer convertToIntBuffer(int[] data) {
IntBuffer buffer = BufferUtils.createIntBuffer(data.length);
return buffer.put(data).flip();
}
private FloatBuffer convertToFloatBuffer(float[] data) {
FloatBuffer buffer = MemoryUtil.memAllocFloat(data.length);
return buffer.put(data).flip();
}
LWJGL 3's org.lwjgl.BufferUtils class is only a small facade over Java's java.nio.ByteBuffer.allocateDirect() method, allowing you to use the JVM's memory allocator to allocate off-heap memory and return a NIO ByteBuffer (or typed view thereof), including making sure that the ByteOrder is nativeOrder().
The NIO Buffer allocated by ByteBuffer.allocateDirect() is managed by the JRE internally and the native memory is freed implicitly as part of the garbage collection cycle once it becomes unreachable.
There are a lot of downsides to allocating off-heap memory with this approach, such as (quote from "Memory management in LWJGL 3"):
-begin-of-quote-
It is slow, much slower than the raw malloc() call. A lot of overhead on top of a function that is already slow.
It scales badly under contention.
It arbitrarily limits the amount of allocated memory (-XX:MaxDirectMemorySize).
Like Java arrays, the allocated memory is always zeroed-out. This is not necessarily bad, but having the option would be better.
There's no way to deallocate the allocated memory on demand (without JDK-specific reflection hacks). Instead, a reference queue is used that usually requires two GC cycles to free the native memory. This quite often leads to OOM errors under pressure.
-end of quote-
LWJGL 3's org.lwjgl.system.MemoryUtil class on the other hand allows you to use other native/off-heap memory allocators instead of the JVM's ByteBuffer allocator to allocate off-heap native memory, including the option of just giving you the raw virtual memory address as a long, avoiding the NIO Buffer instance.
LWJGL supports the system allocator of the C standard library (malloc) as well as currently jemalloc and rpmalloc. All of these provide a very much faster alternative to Java's ByteBuffer.allocateDirect(), alleviating the above mentioned drawbacks.
Because the native memory is not managed by the JVM anymore, you have to free the memory yourself, for which there is the org.lwjgl.system.MemoryUtil.memFree() method.
Before you continue, you should however read the mentioned LWJGL blog post in its entirety, as there are more options, such as org.lwjgl.system.MemoryStack, for allocating native off-heap memory in particular situations (such as short-lived memory), which is even faster than all the other alternatives above mentioned.
I'm writing some stuff that uses ByteBuffers. In the docs of the API it says
There is no way to free a buffer explicitly (without JVM specific
reflection). Buffer objects are subject to GC and it usually takes two
GC cycles to free the off-heap memory after the buffer object becomes
unreachable.
However in a SO post's accepted answer I read
BigMemory uses the memory address space of the JVM process, via direct
ByteBuffers that are not subject to GC unlike other native Java
objects.
Now what should I do, shall I free the created buffer? Or do I misunderstand something in the docs or the answer?
It depends how you create the buffer, there are many possible use cases. Regular ByteBuffer.allocate() will be created on the heap and will be collected by the GC. Other options e.g. native memory might not.
Terracotta BigMemory is a type of native off-heap memory which is not governed by the JVM GC. If you allocate a buffer in this type of memory you have to clear it yourself.
It might be a good idea to clear the buffer even if it's allocated in the heap memory. GC will take care of collecting unused buffer it but this will take some time.
As the documentation of the BufferUtils in LWJGL also say: There is no way to explicitly free a ByteBuffer.
The ByteBuffer objects that are allocated with the standard mechanism (namely, by directly or indirectly calling ByteBuffer#allocateDirect) are subject to GC, and will be cleaned up eventually.
The answer that you linked to seems to refer to the BigMemory library in particular. Using JNI, you can create a (direct) ByteBffer that is not handled by the GC, and where it is up to you to actually free the underlying data.
However, a short advice: When dealing with LWJGL and other libraries that rely on (direct) ByteBuffer objects for the data transfer to the native side, you should think about the usage pattern of these buffers. Particularly for OpenGL binding libraries, you'll frequently need a ByteBuffer that only has space for 16 float values, for example (e.g. containing a matrix that is sent to OpenGL). And in many cases, the methods that do the data transfer with these buffers will be called frequently.
In such a case, it is usually not a good idea to allocate these small, short-lived buffers repeatedly:
class Renderer {
void renderMethodThatIsCalledThousandsOfTimesPerSecond() {
ByteBuffer bb = ByteBuffer.allocateDirect(16 * 4);
fill(bb);
passToOpenGL(bb);
}
}
The creation of these buffers and the GC can significantly reduce performance - and distressingly in the form of GC pauses, that could cause lags in a game.
For such cases, it can be beneficial to pull out the allocation, and re-use the buffer:
class Renderer {
private final ByteBuffer MATRIX_BUFFER_4x4 = ByteBuffer.allocateDirect(16 * 4);
void renderMethodThatIsCalledThousandsOfTimesPerSecond() {
fill(MATRIX_BUFFER_4x4);
passToOpenGL(MATRIX_BUFFER_4x4);
}
}
I am trying to use ByteBuffer as an internal storage for a class. I want to abstract the flip() and ByteBuffer manipulation from the caller but also do not want to use slice() as it creates additional garbage.
Is there any alternative or design suggestions?
Assuming you're running on hotspot and as long as the lifetime of the slice is very shortlived, e.g. immediately used in the method creating it or by its caller, then escape analysis should be able to eliminate that allocation.
That is a JVM optimization, so there's no guarantee that it happens, but it generally is good enough to not worry about those things.
Also, young GCs are very efficient. The cost of such short-lived objects is very low, even if EA does not kick in.
Also, you should avoid premature optimizations. Worry about such things once you measured performance and figured out where the actual bottlenecks are.
I wanted to understand what data structures the heap managers in Java or OS in case of C++ or C keep track of the memory locations used by the threads and processes. One way is to use a map of objects and the memory address and a reverse map of memory starting address and the size of the object in the memory.
But here it won't be able to cater the new memory requests in O(1) time. Is there any better data structure to do this?
Note that unmanaged languages are going to be allocating/freeing memory through system calls, generally not managing it themselves. Still regardless of what level of abstraction (OS to the run time), something has to deal with this:
One method is called buddy block allocation, described well with an example on Wikipedia. It essentially keeps track of the usage of spaces in memory of varying sizes (typically multiples of 2). This can be done with a number of arrays with clever indexing, or perhaps more intuitively with a binary tree, each node tell whether a certain block is free, all nodes on a level representing the same size block.
This suffers from internal fragmentation; as things come and go, you might ended up with your data scattered rather than being efficiently consolidated, making it harder to fit in large data. This could be countered by a more complicated, dynamic system, but buddy blocks have the advantage of simplicity.
The OS keeps track of the process's memory allocation in an overall view - 4KB pages or bigger "lumps" are stored in some form of list.
In the typical Windows implementation (Microsoft's C runtime library) - at least in recent versions, all memory allocations are done through the HeapAlloc() system call. So every single heap allocation goes through to the OS. Whether the OS actually tracks every single allocation or just keeps a map of "what is free, what is used" is another matter. It is my understanding that the heap management code has no list of "current allocations", just a list of freed memory lump
In Linux/Unix, the C library will typically avoid calling the OS for every little allocation, and instead uses a large lump of memory, and splits that up into smaller pieces per allocation. Again, no tracking of allocated memory inside the heap management.
This is done at a process level. I'm not aware of an operating system that differentiates memory allocations on a per-thread level (other than TLS - thread local storage, but that is typically a very small region, outside of the typical heap code management).
So, in summary: the OS and/or C/C++ runtime doesn't actually keep a list of all the used allocations - it keeps a list of "freed" memory [and when another lump is freed, typically will "Join" previous and next consecutive allocations to reduce fragmentation]. When the allocator is firsts started, it's given a large lump, which is then assigned as a single freed allocation. When a request is made, the lump is split into sections and the free list becomes the remainder. When that lump is not sufficient, another big lump is carved off using the underlying OS allocations.
There is a small amount of metadata stored with each allocation, which contains things like "how much memory is allocated", and this metadata is used when freeing the memory. In the typical case, this data is stored immediately before the allocated memory. But there is no way to find the allocation metadata without knowing about the allocations in some other way.
there is no automatic garbage collection in C++. You need to call free/delete for malloc/new heap memory allocations. That's where tools like valgrind(to check memory leak) comes handy. There are other concepts like auto_ptr which automatically frees the heap memory which you can refer to.
I don't understand, what that Buffer classes are for. Aren't they for buffering? I think this should mean that one buffer object should allow both read and write it simultaneously and independently. Nevertheless it is not so: buffer allows only one position, single one for reading and writing. This means that if I wrote something into the buffer with relative put() then I can't read anything sensitive with relative get(). Also if I will call put() and get() interchangeably I will get a delirium.
So are there any usage patterns (samples) for buffers? So that it would be evident that those buffers are somehow better than conventional arrays?
ByteBuffer are used for read and writing data, you can get/put many primitive type and control the endianess. They can be a wrapper for direct memory (off heap) and memory mapped files (also off heap)
They can be used for performance (as they can access a long or double natively without assembling bytes together), direct byte buffers can read/write data without an additional copy into "Java" memory. memory mapped files can be extended to the size of your disk space, allowing you to use lots of memory without impacting your GC times.