I am rendering rather heavy object consisting of about 500k triangles. I use opengl display list and in render method just call glCallList. I thought that once graphic primitives is compiled into display list cpu work is done and it just tells gpu to draw. But now one cpu core is loaded up to 100%.
Could you give me some clues why does it happen?
UPDATE: I have checked how long does it take to run glCallList, it's fast, it takes about 30 milliseconds to run it
Most likely you are hitting the limits on the list length, which are at 64k verteces per list. Try to split your 500k triangles (1500k verteces?) into smaller chunks and see what you get.
btw which graphical chips are you using? If the verteces are processed on CPU, that also might be a problem
It's a bit of a myth that display lists magically offload everything to the GPU. If that was really the case, texture objects and vertex buffers wouldn't have needed to be added to OpenGL. All the display list really is, is a convenient way of replaying a sequence of OpenGL calls and hopefully saving some of the function call/data conversion overhead (see here). None of the PC HW implementations I've used seem to have done anything more than that so far as I can tell; maybe it was different back in the days of SGI workstations, but these days buffer objects are the way to go. (And modern OpenGL books like OpenGL Distilled give glBegin/glEnd etc the briefest of mentions only before getting stuck into the new stuff).
The one place I have seen display lists make a huge difference is the GLX/X11 case where your app is running remotely to your display (X11 "server"); in that case using a display list really does push all the display-list state to the display side just once, whereas a non-display-list immediate-mode app needs to send a bunch of stuff again each frame using lots more bandwidth.
However, display lists aside, you should be aware of some issues around vsync and busy waiting (or the illusion of it)... see this question/answer.
Related
I'm developing a software package which makes heavy use of arrays (ArrayLists). Instructions to be process are put into an array queue to be processed, then when used, deleted from the array. Same with drawing on a plot, data is placed into an array queue, which is read to plot data, and the oldest data is eventually deleted as new data comes in. We are talking about thousands of instructions over an hour and at any time maybe 200,000 points plotted, continually growing/shrinking the array.
After sometime, the software beings to slow where the instructions are processed slower. Nothing really changes as to what is going on for processing, that is, the system is stable as to what how much data is plotted and what instructions are being process, just working off similar incoming data time after time.
Is there some memory issue going on with the "abuse" of the variable-sized (not a defined size, add/delete as needed) arrays/queues that could be causing eventual slowing?
Is there a better way than the String ArrayList to act as a queue?
Thanks!
Yes, you are most likely using the wrong data structure for the job. An ArrayList is a list with a backing array so get() is fast.
The Java runtime library has a very rich set of data structures so you can get a well-written and debugged with the characteristics you need out of the box. You most likely should be using one or more Queues instead.
My guess is that you forget to null out values in your arraylist so the JVM has to keep all of them around. This is a memory leak.
To confirm, use a profiler to see where your memory and cpu go. Visualvm is a nice standalone. Netbeans include one.
The use of VisualVM helped. It showed a heavy use of a "message" form that I was dumping incoming data to and forgot existed, so it was dealing with a million characters when the sluggishness became apparent, because I never limited its size.
i am implementing sliding windows technique to develop photo OCR,i.e.,a rectangule of a specific size is cut from the picture and checked if it contains text or not. Then again the rectangle is shifted by some pixels. But this sliding windows technique is taking a lot of time. For example to process a picture of 1366x768 it takes 6 hours with a step size of 2 and window size of 20x25. Is there any other technique which could be helpful or how to speed up the process?
i am coding in java.
It is hard to give a specific recommendation without knowing any details of your algorithm/code. There are several potential performance improvements you could consider:
Minimize disk I/O and cache misses. You stated that a rectangle is "cut from the picture". If each "cut" is a separate read from disk, it is very inefficient and would contribute significantly to execution time. When you shift your window (by 2 pixels, it appears), most of the data in the new window is the same so try to avoid re-reading that data as much as possible.
Decrease your window size or increase your step size. This obviously affects your result but depending on the size of the characters you are trying to OCR, it might be an option.
If you are applying a convolution filter to do OCR, consider doing fast convolution via a 2D FFT of the image data.
Multithread your application, if it isn't already. While your problem is not embarrassingly parallel, it could be fairly easily multithreaded.
Sliding window approaches are brute force and, therefore, terribly slow by their nature. Perhaps you should take a look at salience-based techniques that use filters to prioritize which areas of the image to process.
Here is a paper I am somewhat familiar with: B. Draper and A. Lionelle. “Evaluation of Selective Attention under Similarity Transformations,” Vision and Image Understanding, 100:152-171, 2005
Finally, what ANN library are you using? Make sure your ANN code is doing matrix/vector operations and that they are as optimized as possible!
So I've run into a bit of a pickle. I'm writing a library using JOGL to display 3D models (and consequently, 2D models) on a GLCanvas. Well, everything was running smoothly until I decided to call the draw method of the individual polygons of an Strixa3DElement into a thread to speed it up a bit. Before, everything drew perfectly to the screen, but VERY slowly. Now, as far as speed goes, it couldn't be better. But it's not drawing anything. Ignoring everything but what the draw method deals with, is there any reason that
https://github.com/NicholasRoge/StrixaGL/blob/master/src/com/strixa/gl/Strixa3DElement.java
shouldn't work?
Edit: Also, for the sake of avoiding concurrency issues in the thread, let's say any given element has no more than 100000 polygons.
It's better to leave render tasks in a gl thread for now.
You don't even using Display Lists. Sure, it will be very slow.
Even after that, rendering is not the speed problem: you can prepare data for rendering in another thread, leaving render loop clean and fast. (moving out this._performGameLogic etc)
You can use VBO, shaders (moving data and render logic from CPU to GPU), offscreen buffers etc etc to improve performance.
If you will continue, you should
check GLArrayDataServer class for use with VBO, unit tests and demos while writing you code.
not pass GL2 as argument: GLContext.getCurrentGL().getGL2();
should try GL2ES2: fixed functions are deprecated, allows using at mobile platforms.
join jabber conference
Some answers about JOGL&threads: Resources: Parallelism in Java for OpenGL realtime applications
I am developing a tile-based physics game like Falling Sand Game. I am currently using a Static VBO for the vertices and a Dynamic VBO for the colors associated with each block type. With this type of game the data in the color VBO changes very frequently. (ever block change) Currently I am calling glBufferSubDataARB for each block change. I have found this to work, yet it doesn't scale well with resolution. (Much slower with each increase in resolution) I was hoping that that I could get double my current playable resolution. (256x256)
Should I call BufferSubData very frequently or BufferData once a frame? Should I drop the VBO and go with vertex array?
What can be done about video cards that do not support VBOs?
(Note: Each block is larger than one pixel)
First of all, you should stop using both functions. Buffer objects have been core OpenGL functionality since around 2002; there is no reason to use the extension form of them. You should be using glBufferData and glBufferSubData, not the ARB versions.
Second, if you want high-performance buffer object streaming, tips can be found on the OpenGL wiki. But in general, calling glBufferSubData many times per frame on the same memory isn't helpful. It would likely be better to map the buffer and modify it directly.
To your last question, I would say this: why should you care? As previously stated, buffer objects are old. It's like asking what you should do for hardware that only support D3D 5.0.
Ignore it; nobody will care.
You should preferrably have the frequently changing color information updated in your own copy in RAM and hand the data to the GL in one operation, once per frame, preferrably at the end of the frame, just before swapping buffers (this means you need to do it once out of line for the very first frame).
glBufferSubData can be faster than glBufferData since it does not reallocate the memory on the server, and since it possibly transfer less data. In your case, however, it is likely slower, because it needs to be synced with the data that is still drawn. Also, since data could possibly change in any random location, the gains from only uploading a subrange won't be great, and uploading the whole buffer once per frame should be no trouble bandwidth-wise.
The best strategy would be to call glDraw(Elements|Arrays|Whatever) followed by glBufferData(...NULL). This tells OpenGL that you don't care about the buffer any more, and it can throw the contents away as soon as it's done drawing (when you map this buffer or copy into it now, OpenGL will secretly use a new buffer without telling you. That way, you can work on the new buffer while the old one has not finished drawing, this avoids a stall).
Now you run your physics simulation, and modify your color data any way you want. Once you are done, either glMapBuffer, memcpy, and glUnmapBuffer, or simply use glBufferData (mapping is sometimes better, but in this case it should make little or no difference). This is the data you will draw the next frame. Finally, swap buffers.
That way, the driver has time to do the transfer while the card is still processing the last draw call. Also, if vsync is enabled and your application blocks waiting for vsync, this time is available to the driver for data transfers. You are thus practically guaranteed that whenever a draw call is made (the next frame), the data is ready.
About cards that do not support VBOs, this does not really exist (well, it does, but not really). VBO is more a programming model, rather than a hardware feature. If you use plain normal vertex arrays, the driver still has to somehow transfer a block of data to the card, eventually. The only difference is that you own a vertex array, but the driver owns a VBO.
Which means in the case of VBO, the driver needs not ask you when to do what. In the case of vertex arrays, it can only rely that the data be valid at the exact time you call glDrawElements. In the case of a VBO, it always knows the data is valid, because you can only modify it via an interface controlled by the driver. This means it can much more optimally manage memory and transfers, and can better pipeline drawing.
There do of course exist implementations that don't support VBOs, but those would need to be truly old (like 10+ years old) drivers. It's not something to worry about, realistically.
In an OpenGL ES 1.x Android application, I generate a circle (from triangles) and then translate it about one hundred times to form a level. Everything works except when a certain event occurs that causes about 15 objects to be immediately added to the arraylist that stores the circles' coordinates. When this event happens 2+ times quickly, all the circles in the list disappear for about 1/5th of a second. Besides this, the circles animate smoothly.
The program runs well as a java SE app using the same synchronization techniques, and I have tried a half a dozen or so other synch techniques to no avail, so I feel the problem is the openGL implementation. Any suggestions?
Do you really have to store the vertex data in client memory? If you don't modify it, I suggest you use a VBO instead. Just upload it into graphics memory once, then draw from there. It will be much faster (not requiring you to send all the vertex data for each draw), and I'm pretty sure you won't run into the problem you described.
Transformations can be done as much as you like, then you only have to give the draw command for each instance of your circle.
So the list is being modified under your nose? It sounds like you need to do any modification to this list on the OpenGL thread. Try Activity.postOnUiThread(Runnable), where Runnable implements your own code. Possibly.