I am writing a 3D viewer which loads some 3D file and displays it simply on a GLSurfaceView.
I originally implemented the viewer in opengles10, however since this is a fixed function api, I was not able to use shaders, and since have moved to opengles20.
A few questions here:
When I load similar models with opengles10 on my HTC desire, things are quick, touch events are as expected mathematically and the model rotates/translates/zooms easily.
However when I use opengles20, my touch events cause hell. I know this because on a onFling event, I rotate my model with a damping factor. This is smooth in all cases.
So :
1) Why in opengles20 do I need to worry about vsyncs and double buffering and a choreographer ??
2) How do i implement double buffering or swap buffers with opengl if the buffers are not available to me ??
3) Is this the only reason for the performance difference ??
4) Finally, what can I do to equate these two ?? An upgrade from opengles10 to opengles20 isn't really a great update if my UI is laggy
Following up on my own question here :
I've decided to use RENDERMODE_CONTINUOUSLY for my rendermode; this allows opengl to swap buffers anytime it can and re-draw.
I also moved my logic for applying rotations/translation to matrices outside the drawing loop.
And lastly, to get all this to play nicely, introduced a mutex to synchronize on, that way updating rotations/translation is thread safe against the opengles thread
Related
I am creating a voxel engine. I have created chunk generation in addition to some simple simplex noise integration but it is extremely laggy due to all of the face of each quad being drawn even the ones you can't see.
To my understanding this is commonly dealt with using ray casting of which I understand the basic theory: you draw several rays from the camera and check for collision, if no collision is found then the face is not within view and therefor should not be rendered. Even though I understand the theory of it all I haven't yet been able to implement it due to lack of prior knowledge and what I found on the internet lacking i.e. they give the code but not the knowledge.
The steps I could imagine I need to take are as follows:
Learn OpenCL (though I haven't used it before to my understanding it allows you to better make use of your graphics card by the use of 'kernels' which I mentally associate with OpenGL 'shaders').
Learn the theory and math behind Ray casting. I have also have heard of ray tracing which I believe has a different use.
Learn how to use this information to not render hidden faces. Assuming I get a working implementation how would I go about telling OpenGL not to render the hidden faces? The cube is one object and to the best of my knowledge there is no way to manipulate the faces of an object in OpenGL only the vertices. Also how would OpenCL communicate with OpenGL? OpenCL isn't a graphics api so it isn't capable of drawing the rays.
Could anyone point me in the right direction? I also believe that there are pure OpenGL implementations as well but I would like to keep the OpenCL aspect as this is a learning experience.
I wouldn't recommend working with OpenCL or OpenGL in developing your first game, both will slow you down extraordinarily because each requires a different mindset.
Well done though on getting as far as you have.
You mentioned that you are currently rendering all quads all the time which you want to remove hidden ones. I have written a voxel engine for practice too and ran into this issue and spent a lot of time thinking how to fix it. My solution was to not draw faces that are facing another voxel.
Imagine two voxels next to each other, the two faces that are touching cant be seen and don't need to be rendered.
However, this will not make any difference if your method of talking with the GPU is the bottleneck. You will have to use buffered methods, I used Display Lists but it is also possible (but harder) to use VBOs.
I'd also recommend grouping large numbers of voxels into chunks for many reasons. Then you only need to recalculate the visible quads on the chunk that changed.
Regarding Ray Casting, If you adopt the chunk system I just described calculating visible entire chucks will be easier. E.g Chunks behind the player don't need to be rendered and that can be calculated with just one dot product calculation per chunk.
Learn OpenCL (though I haven't used it before to my understanding it
allows you to better make use of your graphics card by the use of
'kernels' which I mentally associate with OpenGL 'shaders').
Amd app sdk has many examples/samples from sorting numbers to doing 3d-fluid calculations on a teapot. You can also use cpu with opencl but multiple cpus can bee seen as single device. Also Nvidia and jocl and lwjgl has samples waiting to be reverese-engineered.
Learn the theory and math behind Ray casting. I have also have heard
of ray tracing which I believe has a different use
I only know that ray casting becomes a tracing if those rays cast new rays. Lots of vector algebra like cross products, dot products, normalizations of direction vectors, 3x3 4x4 matrix multiplications and many more. Higher order recursivity is bad for gpu. Try with iterative versions.
Learn how to use this information to not render hidden faces.
You can sort the distances of surface primitives that a ray intersecs and get the smallest distance one. Others shouldnt be seen if there is no refraction on that surface. Using an acceleration structure (bounded bolume hierarchy,..) helps.
The cube is one object and to the best of my knowledge there is no way
to manipulate the faces of an object in OpenGL only the vertices.
Generate in opencl, pass it to opengl, faster than immediate mode.
Also how would OpenCL communicate with OpenGL? OpenCL isn't a graphics
api so it isn't capable of drawing the rays.
Create the context with "sharing" properties to be able to use gl-cl "interop". This enables opencl-opengl communication get as fast as gpu-vram (300 GB/s for high end). Then use gl buffers as cl buffers in this context with proper synchronizations between cl and gl.(glFinish() compute() clFinish() drawArrays())
If it is not interop then communications will be as slow as pci-e bandwidth. Then generating from cpu becomes faster if compute to data ratio is low.
If there are multiple gpus to play with, then you should pack your data as short as possible. Check endianness, alignment of structures. Dont forget to define opencl(device)-side structures if there are any in host side and they must be 1-1 compatible.
So I've run into a bit of a pickle. I'm writing a library using JOGL to display 3D models (and consequently, 2D models) on a GLCanvas. Well, everything was running smoothly until I decided to call the draw method of the individual polygons of an Strixa3DElement into a thread to speed it up a bit. Before, everything drew perfectly to the screen, but VERY slowly. Now, as far as speed goes, it couldn't be better. But it's not drawing anything. Ignoring everything but what the draw method deals with, is there any reason that
https://github.com/NicholasRoge/StrixaGL/blob/master/src/com/strixa/gl/Strixa3DElement.java
shouldn't work?
Edit: Also, for the sake of avoiding concurrency issues in the thread, let's say any given element has no more than 100000 polygons.
It's better to leave render tasks in a gl thread for now.
You don't even using Display Lists. Sure, it will be very slow.
Even after that, rendering is not the speed problem: you can prepare data for rendering in another thread, leaving render loop clean and fast. (moving out this._performGameLogic etc)
You can use VBO, shaders (moving data and render logic from CPU to GPU), offscreen buffers etc etc to improve performance.
If you will continue, you should
check GLArrayDataServer class for use with VBO, unit tests and demos while writing you code.
not pass GL2 as argument: GLContext.getCurrentGL().getGL2();
should try GL2ES2: fixed functions are deprecated, allows using at mobile platforms.
join jabber conference
Some answers about JOGL&threads: Resources: Parallelism in Java for OpenGL realtime applications
I am rendering rather heavy object consisting of about 500k triangles. I use opengl display list and in render method just call glCallList. I thought that once graphic primitives is compiled into display list cpu work is done and it just tells gpu to draw. But now one cpu core is loaded up to 100%.
Could you give me some clues why does it happen?
UPDATE: I have checked how long does it take to run glCallList, it's fast, it takes about 30 milliseconds to run it
Most likely you are hitting the limits on the list length, which are at 64k verteces per list. Try to split your 500k triangles (1500k verteces?) into smaller chunks and see what you get.
btw which graphical chips are you using? If the verteces are processed on CPU, that also might be a problem
It's a bit of a myth that display lists magically offload everything to the GPU. If that was really the case, texture objects and vertex buffers wouldn't have needed to be added to OpenGL. All the display list really is, is a convenient way of replaying a sequence of OpenGL calls and hopefully saving some of the function call/data conversion overhead (see here). None of the PC HW implementations I've used seem to have done anything more than that so far as I can tell; maybe it was different back in the days of SGI workstations, but these days buffer objects are the way to go. (And modern OpenGL books like OpenGL Distilled give glBegin/glEnd etc the briefest of mentions only before getting stuck into the new stuff).
The one place I have seen display lists make a huge difference is the GLX/X11 case where your app is running remotely to your display (X11 "server"); in that case using a display list really does push all the display-list state to the display side just once, whereas a non-display-list immediate-mode app needs to send a bunch of stuff again each frame using lots more bandwidth.
However, display lists aside, you should be aware of some issues around vsync and busy waiting (or the illusion of it)... see this question/answer.
In an OpenGL ES 1.x Android application, I generate a circle (from triangles) and then translate it about one hundred times to form a level. Everything works except when a certain event occurs that causes about 15 objects to be immediately added to the arraylist that stores the circles' coordinates. When this event happens 2+ times quickly, all the circles in the list disappear for about 1/5th of a second. Besides this, the circles animate smoothly.
The program runs well as a java SE app using the same synchronization techniques, and I have tried a half a dozen or so other synch techniques to no avail, so I feel the problem is the openGL implementation. Any suggestions?
Do you really have to store the vertex data in client memory? If you don't modify it, I suggest you use a VBO instead. Just upload it into graphics memory once, then draw from there. It will be much faster (not requiring you to send all the vertex data for each draw), and I'm pretty sure you won't run into the problem you described.
Transformations can be done as much as you like, then you only have to give the draw command for each instance of your circle.
So the list is being modified under your nose? It sounds like you need to do any modification to this list on the OpenGL thread. Try Activity.postOnUiThread(Runnable), where Runnable implements your own code. Possibly.
I've written a speedlimit app that loads data from a set of tiled xml files representing 0.05 sided degree maps.
At the moment, the app checks if I've moved into a new square (using OnLocationChanged) and if so loads in the data for it and the surrounding other 8 tiles.(has a bit of a sanity check and only loads data for new tiles so tends to just load in another 3 tiles worth of data)
Anyway, it currently does this on the UI thread and so there is a noticeable pause when moving into new square and I'd like to shift it to background using Asynctask (It also loads in bitmap maps for display purposes and I've already moved that code into an Asynctask so I know how to do that bit)
My problem is to do with checking the arrays (actually using ArrayLists) used for the speed limits while the Asynctask is possibly adding (and in future version - subtracting) to them in the background.
I was wondering if there was a "professional" :) way of dealing with this sort of situation.
A synchronized access should be dealing with all problems of concurrency accessing the ArrayList. Just use
synchronized(myArrayList) {
// update/read/modifiy
}
Both in your AsyncTask and your UI.
Good resource is the stackoverflow search