When I use glMapBuffer I get a float (byte) buffer as return type, which I can use to modify the data on the server side.
But is there any performance advantages of doing so?
I have an example
Approach 1:
I create a float buffer with vertex data and pass it to glBufferData directly.
Approach 2:
I allocate space using glBufferData and I pass no data...
I get the reference to a float buffer...
I write the float values to it to... and I unmap the buffer.
What are the pros and cons of two approaches
Am I doing the same on both?
I think that second approach avoids duplicates of buffers.
There are two very related aspects to this:
Reducing memory usage.
Avoiding unnecessary copying of data, which can hurt performance.
Calling glBufferData() with data being passed in includes the following:
You allocate buffer memory to store your data.
You store your data in this buffer you allocated.
When you call glBufferData(), the OpenGL implementation allocates memory for the data.
The OpenGL implementation copies the data from your buffer into its own allocation.
Compare this with what happens when you do the same thing with buffer mapping:
When you call glBufferData(), the OpenGL implementation allocates memory for the data.
When you call glMapBuffer(), the OpenGL implementation returns a pointer to its memory.
You store your data in this memory.
You unmap the buffer.
If you compare the two sequences, you have an extra memory allocation in the first one, which means that it requires about twice the memory in total. And the OpenGL implementation has to copy the buffer data in the first one, which is not the case in the second.
In reality, things can get a bit more complicated. Particularly on systems that have dedicated graphics memory (VRAM), there might be more copies of the data. But the principle remains, you reduce extra memory allocations and copying.
Another aspect to keep in mind is what happens beyond the initial use of the buffer, if you want to modify the content of the buffer after it was already used. Again, glMapBuffer() will generally reduce the amount of extra data copying, but it might come at the price of undesired synchronization. So it could be more efficient to pay the price for an extra copy needed for glBufferData() or glBufferSubData() to avoid synchronization points.
If you have these more complex cases where you frequently modify buffer data, you really need to start benchmarking, and you have to expect differences between vendors. You can also look into schemes where you use buffer mapping, but use a pool of buffers you cycle through instead of a single buffer, to reduce/avoid the performance penalty from synchronization.
On top of this, if you work on devices where power/thermal considerations come into play, you may want to measure power usage in addition to just execution speed. Because the fastest solution might not necessarily be the most power efficient.
Related
Let’s say I’ve mapped a memory region [0, 1000] and now I have MappedByteBuffer.
Can I read and write to this buffer from multiple threads at the same time without locking, assuming that each thread accesses different part of the buffer for exp. T1 [0, 500), T2 [500, 1000]?
If the above is true, is it possible to determine whether it’s better to create one big buffer for multiple threads, or smaller buffer for each thread?
Detailed Intro:
If you wanna learn how to answer those questions yourself, check their implementation source codes:
MappedByteBuffer: https://github.com/himnay/java7-sourcecode/blob/master/java/nio/MappedByteBuffer.java (notice it's still abstract, so you cannot instantiate it directly)
extends ByteBuffer: https://github.com/himnay/java7-sourcecode/blob/master/java/nio/ByteBuffer.java
extends Buffer: https://github.com/himnay/java7-sourcecode/blob/329bbb33cbe8620aee3cee533eec346b4b56facd/java/nio/Buffer.java (which only does index checks, and does not grant an actual access to any buffer memory)
Now it gets a bit more complicated:
When you wanna allocate a MappedByteBuffer, you will get either a
HeapByteBuffer: https://github.com/himnay/java7-sourcecode/blob/329bbb33cbe8620aee3cee533eec346b4b56facd/java/nio/HeapByteBuffer.java
or a DirectByteBuffer: https://github.com/himnay/java7-sourcecode/blob/329bbb33cbe8620aee3cee533eec346b4b56facd/java/nio/DirectByteBuffer.java
Instead of having to browse internet pages, you could also simply download the source code packages for your Java version and attach them in your IDE so you can see the code in development AND debug modes. A lot easier.
Short (incomplete) answer:
Neither of them does secure against multithreading.
So if you ever needed to resize the MappedByteBuffer, you might get stale or even bad (ArrayIndexOutOfBoundsException) access
If the size is constant, you can rely on either Implementation to be "thread safe", as far as your requirements are concerned
On a side note, here also lies an implementation failure creep in the Java implementation:
MappedByteBuffer extends ByteBuffer
ByteBuffer has the heap byte[] called "hb"
DirectByteBuffer extends MappedByteBuffer extends ByteBuffer
So DirectByteBuffer still has ByteBuffer's byte[] hb buffer,
but does not use it
and instead creates and manages its own Buffer
This design flaw comes from the step-by-step development of those classes (they were no all planned and implemented at the same time), AND the topic of package visibility, resulting in inversion of dependency/hierarchy of the implementation.
Now to the true answer:
If you wanna do proper object-oriented programming, you should NOT share resource unless utterly needed.
This ESPECIALLY means that each Thread should have its very own Buffer.
Advantage of having one global buffer: the only "advantage" is to reduce the additional memory consumption for additional object references. But this impact is SO MINIMAL (not even 1:10000 change in your app RAM consumption) that you will NEVER notice it. There's so many other objects allocated for any number of weird (Java) reasons everywhere that this is the least of your concerns. Plus you would have to introduce additional data (index boundaries) which lessens the 'advantage' even more.
The big Advantages of having separate buffers:
You will never have to take care of the pointer/index arithmetics
especially when it comes to you needing more threads at any given time
You can freely allocate new threads at any time without having to rearrange any data or do more pointer arithmetics
you can freely reallocate/resize each individual buffer when needed (without worrying about all the other threads' indexing requirement)
Debugging: You can locate problems so much easier that result from "writing out of boundaries", because if they tried, the bad thread would crash, and not other threads that would have to deal with corrupted data
Java ALWAYS checks each array access (on normal heap arrays like byte[]) before it accesses it, exactly to prevent side effects
think back: once upon a time there was the big step in operating systems to introduce linear address space so programs would NOT have to care about where in the hardware RAM they're loaded.
Your one-buffer-design would be the exact step backwards.
Conclusion:
If you wanna have a really bad design choice - which WILL make life a lot harder later on - you go with one global Buffer.
If you wanna do it the proper OO way, separate those buffers. No convoluted dependencies and side effect problems.
I have been working on a Java program that generates fractal orbits for quite some time now. Much like photographs, the larger the image, the better it will be when scaled down. The program uses a 2D object (Point) array, which is written to when a point's value is calculated. That is to say the Point is stored in it's corresponding value, I.e.:
Point p = new Point(25,30);
histogram[25][30] = p;
Of course, this is edited for simplicity. I could just write the point values to a CSV, and apply them to the raster later, but using similar methods has yielded undesirable results. I tried for quite some time because I enjoyed being able to make larger images with the space freed by not having this array. It just won't work. For clarity I'd like to add that the Point object also stores color data.
The next problem is the WriteableRaster, which will have the same dimensions as the array. Combined the two take up a great deal of memory. I have come to accept this, after trying to change the way it is done several times, each with lower quality results.
After trying to optimize for memory and time, I've come to the conclusion that I'm really limited by RAM. This is what I would like to change. I am aware of the -Xmx switch (set to 10GB). Is there any way to use Windows' virtual memory to store the raster and/or the array? I am well aware of the significant performance hit this will cause, but in lieu of lowering quality, there really doesn't seem to be much choice.
The OS is already making hard drive space into RAM for you and every process of course -- no magic needed. This will be more of a performance disaster than you think; it will be so slow as to effectively not work.
Are you looking for memory-mapped files?
http://docs.oracle.com/javase/6/docs/api/java/nio/MappedByteBuffer.html
If this is really to be done in memory, I would bet that you could dramatically lower your memory usage with some optimization. For example, your Point object is mostly overhead and not data. Count up the bytes needed for the reference, then for the Object overhead, compared to two ints.
You could reduce the overhead to nothing with two big parallel int arrays for your x and y coordinates. Of course you'd have to encapsulate this for access in your code. But it could halve your memory usage for this data structure. Millions fewer objects also speeds up GC runs.
Instead of putting a WritableRaster in memory, consider writing out the image file in some simple image format directly, yourself. BMP can be very simple. Then perhaps using an external tool to efficiently convert it.
Try -XX:+UseCompressedOops to reduce object overhead too. Also try -XX:NewRatio=20 or higher to make the JVM reserve almost all its heap for long-lived objects. This can actually let you use more heap.
It is not recommended to configure your JVM memory parameters (Xmx) in order to make the operating system to allocate from it's swap memory. apparently the garbage collection mechanism needs to have random access to heap memory and if doesn't, the program will thrash for a long time and possibly lock up. please check the answer given already to my question (last paragraph):
does large value for -Xmx postpone Garbage Collection
I don't understand, what that Buffer classes are for. Aren't they for buffering? I think this should mean that one buffer object should allow both read and write it simultaneously and independently. Nevertheless it is not so: buffer allows only one position, single one for reading and writing. This means that if I wrote something into the buffer with relative put() then I can't read anything sensitive with relative get(). Also if I will call put() and get() interchangeably I will get a delirium.
So are there any usage patterns (samples) for buffers? So that it would be evident that those buffers are somehow better than conventional arrays?
ByteBuffer are used for read and writing data, you can get/put many primitive type and control the endianess. They can be a wrapper for direct memory (off heap) and memory mapped files (also off heap)
They can be used for performance (as they can access a long or double natively without assembling bytes together), direct byte buffers can read/write data without an additional copy into "Java" memory. memory mapped files can be extended to the size of your disk space, allowing you to use lots of memory without impacting your GC times.
I need a byte buffer class in Java for single-threaded use. The buffer should resize when it's full, rather than throw an exception or something. Very important issue for me is performance.
What would you recommend?
ADDED:
At the momement I use ByteBuffer but it cannot resize. I need one that can resize.
Any reason not to use the boring normal ByteArrayOutputStream?
As mentioned by miku above, Evan Jones gives a review of different types and shows that it is very application dependent. So without knowing further details it is hard to speculate.
I would start with ByteArrayOutputStream, and only if profiling shows it is your performance bottleneck move to something else. Often when you believe the buffer code is the bottleneck, it will actually be network or other IO - wait until profiling shows you need an optimisation before wasting time finding a replacement.
If you are moving to something else, then other factors you will need to think about:
You have said you are using single threaded use, so BAOS's synchronization is not needed
what is the buffer being filled by and fed into? If either end is already wired to use Java NIO, then using a direct ByteBuffer is very efficient.
Are you using a circular buffer or a plain linear buffer? If you are then the Ostermiller Utils are pretty efficient, and GPL'd
You can use a direct ByteBuffer. Direct memory uses virtual memory to start with is only allocated to the application when it is used. i.e. the amount of main memory it uses re-sizes automagically.
Create a direct ByteBuffer larger than you need and it will only consume what you use.
you can also write manual code for checking the buffer content continously and if its full then make a new buffer of greater size and shift all the data in that new buffer.
I'm wondering how I'd code up a ByteBuffer recycling class that can get me a ByteBuffer which is at least as big as the specified length, and which can lock up ByteBuffer objects in use to prevent their use while they are being used by my code. This would prevent re-construction of DirectByteBuffers and such over and over, instead using existing ones. Is there an existing Java library which can do this very effectively? I know Javolution can work with object recycling, but does that extend to the ByteBuffer class in this context with the requirements set out?
It would be more to the point to be more conservative in your usage patterns in the first place. For example there is lots of code out there that shows allocation of a new ByteBuffer on every OP_READ. This is insane. You only need two ByteBuffers at most per connection, one for input and one for output, and depending on what you're doing you can get away with exactly one. In extremely simple cases like an echo server you can get away with one BB for the entire application.
I would look into that rather than paper over the cracks with yet another layer of software.
This is just advice, not an answer. If you do implement some caching for DirectByteBuffer, then be sure to read about the GC implications, because the memory consumed by DirectByteBuffer is not tracked by the garbage collector.
Some references:
A thread - featuring Stack Overflow's tackline
A blog post on the same subject
And the followup
Typically, you would use combination of ThreadLocal and SoftReference wrapper. Former to simplify synchronization (eliminate need for it, essentially); and latter to make buffer recycleable if there's not enough memory (keeping in mind other comments wrt. GC issues with direct buffers). It's actually quite simple: check if SoftReference has buffer with big enough size; if not, allocate; if yes, clear reference. Once you are done with it, re-set reference to point to buffer.
Another question is whether ByteBuffer is needed, compared to regular byte[]. Many developers assume ByteBuffers are better performance-wise, but that assumption is not usually backed by actual data (i.e. testing to see if there is performance difference, and to what direction). Reason why byte[] may often be faster is that code accessing it can be simpler, easier for HotSpot to efficiently JIT.