Horrible performance loss when using Opengl FBO

Horrible performance loss when using Opengl FBO - java

I have successfully implemented a simple 2-d game using lwjgl (opengl) where objects fade away as they get further away from the player. This fading was initially implemented by computing distance to origin of each object from the player and using this to scale the objects alpha/opacity.
However when using larger objects, this approach appears a bit too rough. My solution was to implement alpha/opacity scaling for every pixel in the object. Not only would this look better, but it would also move computation time from CPU to GPU.
I figured I could implement it using an FBO and a temporary texture.
By drawing to the FBO and masking it with a precomputed distance map (a texture) using a special blend mode, I intended to achieve the effect.
The algorithm is like so:
0) Initialize opengl and setup FBO
1) Render background to standard buffer
2) Switch to custom FBO and clear it
3) Render objects (to FBO)
4) Mask FBO using distance-texture
5) Switch to standard buffer
6) Render FBO temporary texture (to standard buffer)
7) Render hud elements
A bit of extra info:
The temporary texture has the same size as the window (and thus standard buffer)
Step 4 uses a special blend mode to achieve the desired effect:
GL11.glBlendFunc( GL11.GL_ZERO, GL11.GL_SRC_ALPHA );
My temporary texture is created with min/mag filters: GL11.GL_NEAREST
The data is allocated using: org.lwjgl.BufferUtils.createByteBuffer(4 * width * height);
The texture is initialized using:
GL11.glTexImage2D( GL11.GL_TEXTURE_2D, 0, GL11.GL_RGBA, width, height, 0, GL11.GL_RGBA, GL11.GL_UNSIGNED_BYTE, dataBuffer);
There are no GL errors in my code.
This does indeed achieve the desired results.
However when I did a bit of performance testing I found that my FBO approach cripples performance. I tested by requesting 1000 successive renders and measuring the time. The results were as following:
In 512x512 resolution:
Normal: ~1.7s
FBO: ~2.5s
(FBO -step 6: ~1.7s)
(FBO -step 4: ~1.7s)
In 1680x1050 resolution:
Normal: ~1.7s
FBO: ~7s
(FBO -step 6: ~3.5s)
(FBO -step 4: ~6.0s)
As you can see, this scales really badly. To make it even worse, I'm intending to do a second pass of this type. The machine I tested on is supposed to be high end in terms of my target audience, so I can expect people to have far below 60 fps with this approach, which is hardly acceptable for a game this simple.
What can I do to salvage my performance?

As suggested by Damon and sidewinderguy I successfully implemented a similar solution using a fragment shader (and vertex shader). My performance is little bit better than my initial cpu-run object-based computation, which is MUCH faster than my FBO-approach. At the same time it provides visual results much closer to the FBO-approach (Overlapping objects behave a bit different).
For anyone interested the fragment shader basically transforms the gl_FragCoord.xy and does a texture lookup. I am not sure this gives the best performance, but with only 1 other texture activated I do not expect performance to increase by omitting the lookup and computing the texture value directly. Also, I now no longer have a performance bottleneck, so further optimizations should wait till it is found to be required.
Also, I am very grateful for the all the help, suggestions and comments I received :-)

Related

Changing 3d model texture based on resolution

As far as I know, you can get the screen size and density using Gdx.graphics.getDensity(), so you can load the right texture for E.g 1x, 1.5x etc..
but what about the texture that comes with the 3D model, for E.g. the texture is only intended for a maximum 1280x800px, while my android has dpi 3x.
I don't want to scale it too much because it can cause the image become too blur/fade/not sharp, anyone who knows the solution please?
EDIT:
let me explain in detail
I've one ModelInstance, texture atlas (2048x2048px) attached.
When the games is opened in 4k screen, I widen the scale of the model almost three times, causing the texture to become blurry, that makes sense because from 240dpi to 640dpi the difference is very far.
so in my opinion the solution is to make some textures atlas for 240dpi, 320dpi, 480dpi etc. the problem is I don't know how to replace the texture atlas which from the beginning already integrated with the Model? so when scaling up, texture atlas is automatically replaced with a higher one. thanks

Usually in 3D graphics the camera or model are mobile. There isn't a fixed best resolution of a texture because the camera may be very near, very far away or viewing the textured surface at a glancing angle.
The solution offered by graphics APIs are settings for texture filtering. Under magnification (where a texel takes up more than a screen pixel) you can do linear filtering for soft edges or point filtering for hard edges. Minification is more complex, you can have linear or point filtering, but you can also have mipmaps, which are a precalculated chain of successively half-sized versions of your image, typically all the way down to 1x1. You can set texture filtering to pick the nearest mipmap, blend between mipmaps, or use anisotropic filtering for better sharpness at glancing angles. Generally linear filtering for magnification, and full mipmap chains with anisotropic filtering for minification produces very good quality and good enough performance to be a good default choice.
So, you won't be giving the GPU a single texture for your model, you'll be giving it a chain of textures, and letting the GPU worry about how to sample that chain to give the correct amount of blur/sharpness. For performance and compatibility with mipmaps, it is usually a good idea to use power-of-two textures (e.g. 1024x1024 rather than 1280x800).
So, just make a 1024x1024 or 2048x2048 texture with mipmaps and appropriate filtering settings then use it on every device regardless of its resolution then quality-wise you're sorted.
If you're particularly worried about memory use or load times then there's an argument to reduce the texture size on lower resolution devices (basically, have a second asset with halved resolution for low-res devices, or just skip the highest resolution mip when loading for low-res devices), but I think that might be a premature optimization at this stage.

Face Features Detection Using OpenCV Haar-cascades

I am using Java with OpenCV Library to detect Face,Eyes and Mouth using Laptop Camera.
What I have done so far:
Capture Video Frames using VideoCapture object.
Detect Face using Haar-Cascades.
Divide the Face region into Top Region and Bottom Region.
Search for Eyes inside Top region.
Search for Mouth inside Bottom region.
Problem I am facing:
At first Video is running normally and suddenly it becomes slower.
Main Questions:
Do Higher Cameras' Resolutions work better for Haar-Cascades?
Do I have to capture Video Frames in a certain scale? for example (100px X100px)?
Do Haar-Cascades work better in Gray-scale Images?
Does different lighting conditions make difference?
What does the method detectMultiScale(params) exactly do?
If I want to go for further analysis for Eye Blinking, Eye Closure Duration, Mouth Yawning, Head Nodding and Head Orientation to Detect Fatigue (Drowsiness) By Using Support Vector Machine, any advices?
Your help is appreciated!

The following article, would give you an overview of the things going under the hood, I would highly recommend to read the article.
Do Higher Cameras' Resolutions work better for Haar-Cascades?
Not necessarily, the cascade.detectMultiScale has params to adjust for various input width, height scenarios, like minSize and maxSize, These are optional params However, But you can tweak these to get robust predictions if you have control over the input image size. If you set the minSize to smaller value and ignore maxSize then it will work for smaller and high res images as well, but the performance would suffer. Also if you imagine now, How come there is no differnce between High-res and low-res images then you should consider that the cascade.detectMultiScale internally scales the images to lower resolutions for performance boost, that is why defining the maxSize and minSize is important to avoid any unnecessary iterations.
Do I have to capture Video Frames in a certain scale? for example
(100px X100px)
This mainly depends upon the params you pass to the cascade.detectMultiScale. Personally I guess that 100 x 100 would be too small for smaller face detection in the frame as some features would be completely lost while resizing the frame to smaller dimensions, and the cascade.detectMultiScale is highly dependent upon the gradients or features in the input image.
But if the input frame only has face as a major part, and there are no other smaller faces dangling behind then you may use 100 X 100. I have tested some sample faces of size 100 x 100 and it worked pretty well. And if this is not the case then 300 - 400 px width should work good. However you would need to tune the params in order to achieve accuracy.
Do Haar-Cascades work better in Gray-scale Images?
They work only in gray-scale images.
In the article, if you read the first part, you will come to know that it face detection is comprised of detecting many binary patterns in the image, This basically comes from the ViolaJones, paper which is the basic of this algorithm.
Does different lighting conditions make difference?
May be in some cases, largely Haar-features are lighting invariant.
If you are considering different lighting conditions as taking images under green or red light, then it may not affect the detection, The haar-features (since dependent on gray-scale) are independent of the RGB color of input image. The detection mainly depends upon the gradients/features in the input image. So as far as there are enough gradient differences in the input image such as eye-brow has lower intensity than fore-head, etc. it will work fine.
But consider a case when input image has back-light or very low ambient light, In that case it may be possible that some prominent features are not found, which may result in face not detected.
What does the method detectMultiScale(params) exactly do?
I guess, if you have read the article, by this time, then you must be knowing it well.
If I want to go for further analysis for Eye Blinking, Eye Closure
Duration, Mouth Yawning, Head Nodding and Head Orientation to Detect
Fatigue (Drowsiness) By Using Support Vector Machine, any advices?
No, I won't suggest you to perform these type of gesture detection with SVM, as it would be extremely slow to run 10 different cascades to conclude current facial state, However I would recommend you to use some Facial Landmark Detection Framework, such as Dlib, You may search for some other frameworks as well, because the model size of dlib is nearly 100MB and it may not suit your needs i f you want to port it to mobile device. So the key is ** Facial Landmark Detection **, once you get the full face labelled, you can draw conclusions like if the mouth if open or the eyes are blinking, and it works in Real-time, so your video processing won't suffer much.

Terrain flickers through water | Depth buffer precision issue?

I have an issue with the water in my terrain, which is currently just a quad with a transparent blue colour.
When close up to it, it looks like this:
As you can see, it's simple enough - A flat transparent quad representing water.
However, when I get further away, this happens:
For those who can't see the GIF, or need an external link (or can't understand what's going on), the terrain is glitching around the water. If you look near the water near enough, you will see the terrain glitching above/below it.
Here is the code where I prepare the 3D view:
static void ready3D()
{
glViewport(0, 0, Display.getWidth(),Display.getHeight());
glMatrixMode(GL_PROJECTION);
glLoadIdentity();
GLU.gluPerspective(45, (float) Display.getWidth()/Display.getHeight(), 50f, 500f);
glMatrixMode(GL_MODELVIEW);
glLoadIdentity();
glDepthFunc(GL_LEQUAL);
glEnable(GL_DEPTH_TEST);
}
Does anyone know what the issue is, and how I could improve the precision of the depth buffer (presuming the depth buffer is the issue)?

It's difficult to tell from the GIF, but what I'm gathering is that you're experiencing an instance of Z-Fighting.
There are several solutions that tend to be most effective for dealing with Z-Fighting:
Increase the Resolution of the Z-Buffer.
Adjust the relative positions of the Water & Land to separate them further, thus reducing the chance of the two geometries colliding like this
Make use of the Stencil Buffer to ensure that objects are drawn correctly.
I've also seen solutions that try to disable Depth Testing when handling certain objects, but those are almost never good as a general solution.

Increasing depth buffer wouldn't help a lot as having an open world causes a big distance from zNear to zFar which maps to your depth buffer. One of common tricks is implementing logarithmic depth buffer. This gives you an ability to accent where do you need details but in current situation this need changes with altitude of your vehicle+camera system. So logarithmic buffer is not a solution and increasing "linear" buffer wouldn't affect a lot. I recommend to read this and this awesome articles about depth buffer.
Second point is that your landscape seems procedurally generated. If it is so vertices could spawn really close to water plane height. And winning in this floating point comparison war is not simple with just depth buffer manipulations (if possible at all).
Third point is that in general you don't want extra runtime job and that's the reason stencil buffer usage is questionable here (As JPhi1618 noticed).
So if you have an opportunity to modify your landscape generation algorithm from my point of view it is worth it. From JPhi1618 words again:
if Y is less than the level of the water even a little bit, make it way less than the water.

Ideal number of spritesheets for LWJGL/OpenGL

I am making a game using LWJGL, and so far I decided to have 5 or 6 sprite sheets. It is like one for the blocks, one for the items, objects, etc. By doing that, I have sprite sheets with sprites that are closely related, and normally have the same size (in the case of blocks and items). But is this the better way? Or should I just throw everything on a single sprite sheet, with no organization whatsoever?
If I am doing it the right way, there is also another problem. For example, when you are in the map, I need to draw the blocks. But over the blocks, there can be electrical wires and other stuff - that are in a separated sprite sheet. This information, however, is stored in the same array. So normally I just iterate over it once, and each time, draw the block, and then the wire over it - switching sprite sheets twice each iteration. But I thought it might take some time to switch these, so maybe it would be more interesting to run the thing twice, first draw all the blocks, and then iterate again to draw the wires? To change the textures, I am using the SlickUtil Texture class, which has a bind method - really easy to use.

There is no "ideal"; there are simply the factors that matter for your needs.
Remember the reason why you use sprite sheets at all: because switching textures is too expensive to do per-object when dealing with 2D rendering. So as long as you're not switching textures for each sprite you render, you'll already be ahead of the game performance-wise.
The other considerations you need to take into account are:
Minimum user hardware specifications. Specifically, what is the minimum size of GL_MAX_TEXTURE_SIZE you want your code to work on? The larger your sprite sheets get, the greater your hardware requirements, since a single sprite sheet must be a texture.
This value is hardware-dependent, but there are some general requirements. OpenGL 3.3 requires 1024 at a minimum; pretty much every piece of GL 3.3 hardware gives 4096. OpenGL 4.3 requires a massive 16384, which is approaching the theoretical limits of floating-point texture coordinate capacity (assuming you want at least 8 bits of sub-pixel texture coordinate precision).
GL 2.1 has a minimum requirement of 64, but any actual 2.1 hardware people will have will offer between 512 and 2048. So pick your sprite sheet size based on this.
What you're rendering. You want to be able to render as much as possible from one sprite sheet. What you want to avoid is frequent texture switches. If your world is divided into layers, if you can fit your sprites for each layer onto their own sheet, you're doing fine. No hardware is going to choke on 20 texture changes; it's tens of thousands that are the problem.
The main thing is to render everything that uses a sheet all at once. Not necessarily in the same render call; you can switch meshes and shader uniforms/fixed-function state. But you shouldn't switch sheets between these renders until you've rendered everything needed for that sheet.

Resizing an image using OpenGL

I'd like to resize an image using OpenGL. I'd prefer it in Java, but C++ is OK, too.
I am really new to the whole thing so here's the process in words as I see it:
load the image as a texture into OGL
set some stuff, regarding state & filtering
draw the texture in different size onto another texture
get the texture data into an array or something
Do you think if it would be faster to use OpenGL and the GPU than using a CPU-based BLIT library?
Thanks!

Instead of rendering a quad into the destination FBO, you can simply use hardware blit functionality: glBlitFramebuffer. Its arguments are straight forward, but it requires a careful preparation of your source and destination FBO's:
ensure FBO's are complete (glCheckFramebufferStatus)
set read FBO target and write FBO target independently (glBindFramebuffer)
set draw buffers and read buffers (glDraw/ReadBuffers)
call glBlitFramebuffer, setting GL_LINEAR filter in the argument
I bet it will be much faster on GPU, especially for large images.

Depends, if the images are big, it might be faster using OpenGL. But if it's just doing the resize process and no more processing on the GPU side, then it's not worth it as is very likely that is going to be slower than the CPU.
But if you need to resize the image, and you can implement further processing in OpenGL, then is a worthy idea.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.