I am having some memory problems with a project of mine. After a bit of tracing memory on several positions in my program, I retraced the problem to this line:
(FloatBuffer)lightBuffer.asFloatBuffer().put(lightAmbient).flip()
I am using the resultant buffer in a function straight away, but it seems that that float buffer is not emptied after it has been used.
So how do I properly empty/dereference a buffer in java?
ps: I tried the clear() method, but according to the java documentation that only resets the buffer; it does not remove the data from it.
Assuming your lightBuffer is a ByteBuffer (it seems no other class has a method asFloatBuffer), your FloatBuffer object is only a wrapper around the same underlying byte[] or native memory.
This FloatBuffer object does not eat any more memory (but uses the same memory the lightBuffer uses), but it could hinder the garbage collection of the ByteBuffer - but it seems you are reusing this one, don't you?
So there seems to be no problem so far (apart from you not knowing that there are several Buffers using the same memory).
Have you stored a reference to this buffer somewhere? I don't think that you can explicitly "remove the data" from the buffer and System.gc() won't help you here.
The garbage collector should handle this automatically for you unless you are maintaining a reference to this buffer.
Er, you don't.
The NIO buffers don't waste time with such things, they overwrite data in the buffer and move the marks.
If you want to get rid of it, you have to get rid of all references to it and make it available to the GC so it can be collected.
Related
When I use glMapBuffer I get a float (byte) buffer as return type, which I can use to modify the data on the server side.
But is there any performance advantages of doing so?
I have an example
Approach 1:
I create a float buffer with vertex data and pass it to glBufferData directly.
Approach 2:
I allocate space using glBufferData and I pass no data...
I get the reference to a float buffer...
I write the float values to it to... and I unmap the buffer.
What are the pros and cons of two approaches
Am I doing the same on both?
I think that second approach avoids duplicates of buffers.
There are two very related aspects to this:
Reducing memory usage.
Avoiding unnecessary copying of data, which can hurt performance.
Calling glBufferData() with data being passed in includes the following:
You allocate buffer memory to store your data.
You store your data in this buffer you allocated.
When you call glBufferData(), the OpenGL implementation allocates memory for the data.
The OpenGL implementation copies the data from your buffer into its own allocation.
Compare this with what happens when you do the same thing with buffer mapping:
When you call glBufferData(), the OpenGL implementation allocates memory for the data.
When you call glMapBuffer(), the OpenGL implementation returns a pointer to its memory.
You store your data in this memory.
You unmap the buffer.
If you compare the two sequences, you have an extra memory allocation in the first one, which means that it requires about twice the memory in total. And the OpenGL implementation has to copy the buffer data in the first one, which is not the case in the second.
In reality, things can get a bit more complicated. Particularly on systems that have dedicated graphics memory (VRAM), there might be more copies of the data. But the principle remains, you reduce extra memory allocations and copying.
Another aspect to keep in mind is what happens beyond the initial use of the buffer, if you want to modify the content of the buffer after it was already used. Again, glMapBuffer() will generally reduce the amount of extra data copying, but it might come at the price of undesired synchronization. So it could be more efficient to pay the price for an extra copy needed for glBufferData() or glBufferSubData() to avoid synchronization points.
If you have these more complex cases where you frequently modify buffer data, you really need to start benchmarking, and you have to expect differences between vendors. You can also look into schemes where you use buffer mapping, but use a pool of buffers you cycle through instead of a single buffer, to reduce/avoid the performance penalty from synchronization.
On top of this, if you work on devices where power/thermal considerations come into play, you may want to measure power usage in addition to just execution speed. Because the fastest solution might not necessarily be the most power efficient.
I'm trying parse a ByteBuf that is not in the JVM heap, to Google Protocol Buffer object. In fact it's the direct memory byte buffer which Netty passed to me.
This is what I am currently doing:
ByteBuf buf = ...;
ByteBufInputStream stream = new ByteBufInputStream(buf);
Message msg = MyPbMessage.getDefaultInstance().getParserForType().parseFrom(stream);
This can work. However, I found this type of parsing introduce new byte array per message, and cause a lot GC.
So is there a way to avoid these in heap byte arrays creating? i.e, parse Google Protocol Buffer bytes directly from native memory.
You can do it the way the Guava guys do store a small buffer (1024 bytes) in a ThreadLocal, use it if it suffices and never put a bigger buffer in the TL.
This'll work fine as long as most requests can be served by it. If the average/median size is too big, you could go for soft/weak reference, however, without some real testing it's hard to tell if it helps.
You could combine the approaches, i.e., use a strongly referenced small buffer and a weakly referenced big buffer in the TL. You could pool your buffers, you could...
... but note that it all has its dark side. Wasted memory, prolonged buffer lifetime leading to promoting them to the old generations, where garbage collecting is much more expensive.
I am creating two arrays in c++ which will be read in java side:
env->NewDirectByteBuffer
env->NewByteArray
Do these functions copy the buffer I send it?
Do I need to create the buffer on the heap in the c++ side or is it ok to create it on the stack because the jvm will copy it?
for example will this code run ok:
std::string stam = "12345";
const char *buff = stam.c_str();
jobject directBuff = env->NewDirectByteBuffer((void*)buff, (jlong) stam.length() );
Another example:
std::string md5 "12345";
jbyteArray md5ByteArray = env->NewByteArray((jsize) (md5.length()));
env->SetByteArrayRegion(md5ByteArray, 0, (jsize) (md5.length()), (jbyte*)
md5.c_str());
string is created on the stack. Will this code always work or do I need to create those strings on the heap and be responsible to delete it after java finishes using it
Your use of DirectByteBuffer will almost certainly fail in spectacular, core-dumping, and unpredictable ways. And its behavior may vary between JVM implementations and operating systems. The problem is that your direct memory must remain valid for the lifetime of the DirectByteBuffer. Since your string is on the stack, it will go out of scope rather quickly. Meanwhile the Java code may or may not continue to use the DirectByteBuffer, depending on what it is. Are you writing the Java code too? Can you guarantee that its use of the DirectByteBuffer will be complete before the string goes out of scope?
Even if you can guarantee that, realize that Java's GC is non-deterministic. It is all too easy to think that your DirectByteBuffer isn't being used any more, but meanwhile it is wandering around in unreclaimed objects, which eventually get hoovered up by the GC, which may call some finalize() method that accidentally touches the DirectByteBuffer, and -- kablooey! In practice, it is very difficult to make these guarantees except for blocks of "shared memory" that never go away for the life of your application.
NewDirectByteBuffer is also not that fast (at least not in Windows), despite the intuitive assumption that performance is what it is all about. I've found experimentally that it is faster to copy 1000 bytes than it is to create a single DirectByteBuffer. It is usually much faster to have your Java pass a byte[] into the C++ and have the C++ copy bytes into it (ahem, assuming they fit). Overall, I make these recommendations:
Call NewByteArray() and SetByteArrayRegion(), return the resulting
jBytearray to Java and have no worries.
If performance is a
requirement, pass the byte[] from Java to C++ and have C++ fill it
in. You might need two C++ calls, one to get the size and the next
to get the data.
If the data is huge, use NewDirectBtyeBuffer and
make sure that the C++ data stays around "forever", or until you are
darn certain that the DirectByteBuffer has been disposed.
I've also read that both C++ and Java can memory-map the same file, and that this works very well for large data.
NewDirectByteBuffer: "Allocates and returns a direct java.nio.ByteBuffer referring to the block of memory starting at the memory address address and extending capacity bytes.
"Native code that calls this function and returns the resulting byte-buffer object to Java-level code should ensure that the buffer refers to a valid region of memory that is accessible for reading and, if appropriate, writing. An attempt to access an invalid memory location from Java code will either return an arbitrary value, have no visible effect, or cause an unspecified exception to be thrown.".
No copying there.
New<Primitive>Array: only arguments are JNIEnv * and length, so there is nothing to copy.
Set<Primitive>Array: "A family of functions that copies back a region of a primitive array from a buffer."
I need a byte buffer class in Java for single-threaded use. The buffer should resize when it's full, rather than throw an exception or something. Very important issue for me is performance.
What would you recommend?
ADDED:
At the momement I use ByteBuffer but it cannot resize. I need one that can resize.
Any reason not to use the boring normal ByteArrayOutputStream?
As mentioned by miku above, Evan Jones gives a review of different types and shows that it is very application dependent. So without knowing further details it is hard to speculate.
I would start with ByteArrayOutputStream, and only if profiling shows it is your performance bottleneck move to something else. Often when you believe the buffer code is the bottleneck, it will actually be network or other IO - wait until profiling shows you need an optimisation before wasting time finding a replacement.
If you are moving to something else, then other factors you will need to think about:
You have said you are using single threaded use, so BAOS's synchronization is not needed
what is the buffer being filled by and fed into? If either end is already wired to use Java NIO, then using a direct ByteBuffer is very efficient.
Are you using a circular buffer or a plain linear buffer? If you are then the Ostermiller Utils are pretty efficient, and GPL'd
You can use a direct ByteBuffer. Direct memory uses virtual memory to start with is only allocated to the application when it is used. i.e. the amount of main memory it uses re-sizes automagically.
Create a direct ByteBuffer larger than you need and it will only consume what you use.
you can also write manual code for checking the buffer content continously and if its full then make a new buffer of greater size and shift all the data in that new buffer.
I'm wondering how I'd code up a ByteBuffer recycling class that can get me a ByteBuffer which is at least as big as the specified length, and which can lock up ByteBuffer objects in use to prevent their use while they are being used by my code. This would prevent re-construction of DirectByteBuffers and such over and over, instead using existing ones. Is there an existing Java library which can do this very effectively? I know Javolution can work with object recycling, but does that extend to the ByteBuffer class in this context with the requirements set out?
It would be more to the point to be more conservative in your usage patterns in the first place. For example there is lots of code out there that shows allocation of a new ByteBuffer on every OP_READ. This is insane. You only need two ByteBuffers at most per connection, one for input and one for output, and depending on what you're doing you can get away with exactly one. In extremely simple cases like an echo server you can get away with one BB for the entire application.
I would look into that rather than paper over the cracks with yet another layer of software.
This is just advice, not an answer. If you do implement some caching for DirectByteBuffer, then be sure to read about the GC implications, because the memory consumed by DirectByteBuffer is not tracked by the garbage collector.
Some references:
A thread - featuring Stack Overflow's tackline
A blog post on the same subject
And the followup
Typically, you would use combination of ThreadLocal and SoftReference wrapper. Former to simplify synchronization (eliminate need for it, essentially); and latter to make buffer recycleable if there's not enough memory (keeping in mind other comments wrt. GC issues with direct buffers). It's actually quite simple: check if SoftReference has buffer with big enough size; if not, allocate; if yes, clear reference. Once you are done with it, re-set reference to point to buffer.
Another question is whether ByteBuffer is needed, compared to regular byte[]. Many developers assume ByteBuffers are better performance-wise, but that assumption is not usually backed by actual data (i.e. testing to see if there is performance difference, and to what direction). Reason why byte[] may often be faster is that code accessing it can be simpler, easier for HotSpot to efficiently JIT.