I am developing a tile-based physics game like Falling Sand Game. I am currently using a Static VBO for the vertices and a Dynamic VBO for the colors associated with each block type. With this type of game the data in the color VBO changes very frequently. (ever block change) Currently I am calling glBufferSubDataARB for each block change. I have found this to work, yet it doesn't scale well with resolution. (Much slower with each increase in resolution) I was hoping that that I could get double my current playable resolution. (256x256)
Should I call BufferSubData very frequently or BufferData once a frame? Should I drop the VBO and go with vertex array?
What can be done about video cards that do not support VBOs?
(Note: Each block is larger than one pixel)
First of all, you should stop using both functions. Buffer objects have been core OpenGL functionality since around 2002; there is no reason to use the extension form of them. You should be using glBufferData and glBufferSubData, not the ARB versions.
Second, if you want high-performance buffer object streaming, tips can be found on the OpenGL wiki. But in general, calling glBufferSubData many times per frame on the same memory isn't helpful. It would likely be better to map the buffer and modify it directly.
To your last question, I would say this: why should you care? As previously stated, buffer objects are old. It's like asking what you should do for hardware that only support D3D 5.0.
Ignore it; nobody will care.
You should preferrably have the frequently changing color information updated in your own copy in RAM and hand the data to the GL in one operation, once per frame, preferrably at the end of the frame, just before swapping buffers (this means you need to do it once out of line for the very first frame).
glBufferSubData can be faster than glBufferData since it does not reallocate the memory on the server, and since it possibly transfer less data. In your case, however, it is likely slower, because it needs to be synced with the data that is still drawn. Also, since data could possibly change in any random location, the gains from only uploading a subrange won't be great, and uploading the whole buffer once per frame should be no trouble bandwidth-wise.
The best strategy would be to call glDraw(Elements|Arrays|Whatever) followed by glBufferData(...NULL). This tells OpenGL that you don't care about the buffer any more, and it can throw the contents away as soon as it's done drawing (when you map this buffer or copy into it now, OpenGL will secretly use a new buffer without telling you. That way, you can work on the new buffer while the old one has not finished drawing, this avoids a stall).
Now you run your physics simulation, and modify your color data any way you want. Once you are done, either glMapBuffer, memcpy, and glUnmapBuffer, or simply use glBufferData (mapping is sometimes better, but in this case it should make little or no difference). This is the data you will draw the next frame. Finally, swap buffers.
That way, the driver has time to do the transfer while the card is still processing the last draw call. Also, if vsync is enabled and your application blocks waiting for vsync, this time is available to the driver for data transfers. You are thus practically guaranteed that whenever a draw call is made (the next frame), the data is ready.
About cards that do not support VBOs, this does not really exist (well, it does, but not really). VBO is more a programming model, rather than a hardware feature. If you use plain normal vertex arrays, the driver still has to somehow transfer a block of data to the card, eventually. The only difference is that you own a vertex array, but the driver owns a VBO.
Which means in the case of VBO, the driver needs not ask you when to do what. In the case of vertex arrays, it can only rely that the data be valid at the exact time you call glDrawElements. In the case of a VBO, it always knows the data is valid, because you can only modify it via an interface controlled by the driver. This means it can much more optimally manage memory and transfers, and can better pipeline drawing.
There do of course exist implementations that don't support VBOs, but those would need to be truly old (like 10+ years old) drivers. It's not something to worry about, realistically.
Related
I’m trying to find a may to minimize memory allocation and garbage collection in a Java OpenGL (JOGL) application. I am porting some C/C++/C# OpenGL project to Java as a learning exercise. One thing I am running into is the lack of pointers in Java and the allocation of objects and GC of them as the application runs. In C/C++/C# I can get the applications to startup and simply run without allocating any additional memory or objects by passing references around but in Java it seems my designs are incompatible.
As these designs have evolved they are using higher level objects. In C they were structs for Vectors and Matrices and in C++/C# classes. These all essentially boil down to arrays of bytes in memory. Which are then cast in one way or another to float[] for OpenGL calls or object arrays internally within the application so I can us object based operations like operator overloading, add and multiply, or property accessing for example. Anyone working with OpenGL probably sees what I am doing. This way I allocate everything on load and simply pass the data around.
Java has thrown me for some loops. It appears I cannot cast data back and forth thus I keep creating lots of data and the GC comes by and does it work. This is noticeable by resources being consumed and cleaned up and a noticeable stutter during the application run. I have alleviated some of this by creating FloatBuffers in addition to VectorXf arrays for my geometry data and pass the FloatBuffer down to OpenGL. But when I need to update Vector data I have to recopy the data back to the float buffer. This also means I am storing double the data and incurring the overhead of the floatbuffer fills.
I’d like to hear how others are dealing with these issues. I’d like to keep the higher order objects for the functionality built in but be able to pass the data to OpenGL. Are my designs simply incompatible with Java? Do I need to move to FloatBuffers exclusively? How does one pass component data into a higher order object without the penalty of object creation. So many OpenGL applications exist that I suspect there is some ‘magic’ to utilize either the same buffer for float[] and Object[] or allocate contiguous block for object data and pass a reference to OpenGL.
The driving force in managing your OpenGL data is that you don't want to be responsible for the memory containing the geometry or textures. The use of float[] or even FloatBuffers should only be for the purpose of transferring geometry data into OpenGL buffer objects. Once you've created an OpenGL buffer and copied the data to it, you no longer need to keep a copy in your JVM. On virtually all modern hardware this will cause the data to be retained on the video card itself, completely outside of the JVM.
Ideally, if most of your geometry is static, you can copy it to OpenGL buffers at startup time and never have to manage it directly again. If you're dealing with a lot of dynamic geometry, then you're still going to be having to transfer data back and forth to the OpenGL driver. In this case, you probably want to maintain a pool of FloatBuffers that can act as ferries for moving the data between your code generating or discovering the changing geometry, and the driver. FloatBuffers are unavoidable because OpenGL expects data in a given format, which is going to be different than the internal representation of the data in the JVM, but at the very least you don't need to be keeping a separate FloatBuffer around for every set of data you have.
My experience:
I was using FloatBuffers for only transferring data, but I found that this was really performance killing for dynamic meshes because I had to transform my Vec arrays to FloatBuffers everytime I change my meshes. Now I got rid of my vec arrays and only use FloatBuffers persistently through my mesh classes, less elegant to handle them but much faster. So I would advise you to keep & update all your geometry data with FloatBuffers
Thank you for the help in advance!
I am making a 2D game in Java with LWJGL, and I am separating the renderer and game logic into separate threads.
To do so, I have to put the world data in the view into a buffer, which then gets passed to the renderer thread.
The data is made up from the world, which is static and can be passed by reference, but the entities are too dynamic to do so. The maximum number of entities would be a couple hundred to a few thousand.
Since the renderer only draws sprites, I want to fill up the buffer with a data structure of the sprites, and the coordinates to draw them to, which the renderer can read from. This is at 60 FPS.
I can use a LinkedList or Arraylist, but the varying data count and creation-deletion may cause too much overhead. I also saw other buffer types used in other code, though I didn't understand them, so I suspect there are other options, not to mention I'm not too experienced with the performance limitations of the basic ones either.
What would be a good way to build my buffer?
I am rendering rather heavy object consisting of about 500k triangles. I use opengl display list and in render method just call glCallList. I thought that once graphic primitives is compiled into display list cpu work is done and it just tells gpu to draw. But now one cpu core is loaded up to 100%.
Could you give me some clues why does it happen?
UPDATE: I have checked how long does it take to run glCallList, it's fast, it takes about 30 milliseconds to run it
Most likely you are hitting the limits on the list length, which are at 64k verteces per list. Try to split your 500k triangles (1500k verteces?) into smaller chunks and see what you get.
btw which graphical chips are you using? If the verteces are processed on CPU, that also might be a problem
It's a bit of a myth that display lists magically offload everything to the GPU. If that was really the case, texture objects and vertex buffers wouldn't have needed to be added to OpenGL. All the display list really is, is a convenient way of replaying a sequence of OpenGL calls and hopefully saving some of the function call/data conversion overhead (see here). None of the PC HW implementations I've used seem to have done anything more than that so far as I can tell; maybe it was different back in the days of SGI workstations, but these days buffer objects are the way to go. (And modern OpenGL books like OpenGL Distilled give glBegin/glEnd etc the briefest of mentions only before getting stuck into the new stuff).
The one place I have seen display lists make a huge difference is the GLX/X11 case where your app is running remotely to your display (X11 "server"); in that case using a display list really does push all the display-list state to the display side just once, whereas a non-display-list immediate-mode app needs to send a bunch of stuff again each frame using lots more bandwidth.
However, display lists aside, you should be aware of some issues around vsync and busy waiting (or the illusion of it)... see this question/answer.
In an OpenGL ES 1.x Android application, I generate a circle (from triangles) and then translate it about one hundred times to form a level. Everything works except when a certain event occurs that causes about 15 objects to be immediately added to the arraylist that stores the circles' coordinates. When this event happens 2+ times quickly, all the circles in the list disappear for about 1/5th of a second. Besides this, the circles animate smoothly.
The program runs well as a java SE app using the same synchronization techniques, and I have tried a half a dozen or so other synch techniques to no avail, so I feel the problem is the openGL implementation. Any suggestions?
Do you really have to store the vertex data in client memory? If you don't modify it, I suggest you use a VBO instead. Just upload it into graphics memory once, then draw from there. It will be much faster (not requiring you to send all the vertex data for each draw), and I'm pretty sure you won't run into the problem you described.
Transformations can be done as much as you like, then you only have to give the draw command for each instance of your circle.
So the list is being modified under your nose? It sounds like you need to do any modification to this list on the OpenGL thread. Try Activity.postOnUiThread(Runnable), where Runnable implements your own code. Possibly.
So I need to understand how swing allocates memory for buffering screen rendering. Obviously there are duplicates if you have double/tripple/etc buffering. However I would need to know when swing allocates memory and how much of it. Very helpful to know if I have multiple windows open (launched from the same jvm) how much memory is needed depending on windows being maximized to one screen, multiple screens (I need it to go up to 6 screens), etc.
Does anyone know of any good readings or maybe have answers for how Java Swing/AWT allocate memory for rendering buffers.
End of the day, I am looking for a definitive formula so that if I have a number of windows opened, number of buffers in each window, location of windows, and size of each window I can get an exact byte count required to render the application (just the buffering part, the rest of the memory is another problem)
I was assuming it was (single buffered) x by y of each window = 1 buffer, add those together and you have all memory requirements, but profiling the data this appears to be far from the truth, some buffers are weak/soft references, some strong, and I cannot determine the way to calculate (yet :)).
Edit: I am using JFrame objects (for better or worse) to do my top-level stuff.
Double buffering is a convenient feature of JPanel, but there will always be a significant platform-dependent contribution: Every visible JComponent belongs to a heavyweight peer whose memory is otherwise inaccessible to the JVM.
If you're trying to avoid running out of memory, pick a reasonable value for the startup parameters and instruct the user how to change them. ImageJ is a good example.
I'd recommend JConsole and the Swing Source code fro this kind of precision.
I assume you realize this will be extremely tedious to calculate since you have to consider every object created somewhere in the process, which of course will depend on the controls involved in the UI.
I am not aware of any automatic screen buffering support in Swing. If you need double buffering, you need to implement it yourself in which case you will know better how to calculate memory requirements :-)
See this answer: Java: how to do double-buffering in Swing? for more information and good pointers.