I am using Java with OpenCV Library to detect Face,Eyes and Mouth using Laptop Camera.
What I have done so far:
Capture Video Frames using VideoCapture object.
Detect Face using Haar-Cascades.
Divide the Face region into Top Region and Bottom Region.
Search for Eyes inside Top region.
Search for Mouth inside Bottom region.
Problem I am facing:
At first Video is running normally and suddenly it becomes slower.
Main Questions:
Do Higher Cameras' Resolutions work better for Haar-Cascades?
Do I have to capture Video Frames in a certain scale? for example (100px X100px)?
Do Haar-Cascades work better in Gray-scale Images?
Does different lighting conditions make difference?
What does the method detectMultiScale(params) exactly do?
If I want to go for further analysis for Eye Blinking, Eye Closure Duration, Mouth Yawning, Head Nodding and Head Orientation to Detect Fatigue (Drowsiness) By Using Support Vector Machine, any advices?
Your help is appreciated!
The following article, would give you an overview of the things going under the hood, I would highly recommend to read the article.
Do Higher Cameras' Resolutions work better for Haar-Cascades?
Not necessarily, the cascade.detectMultiScale has params to adjust for various input width, height scenarios, like minSize and maxSize, These are optional params However, But you can tweak these to get robust predictions if you have control over the input image size. If you set the minSize to smaller value and ignore maxSize then it will work for smaller and high res images as well, but the performance would suffer. Also if you imagine now, How come there is no differnce between High-res and low-res images then you should consider that the cascade.detectMultiScale internally scales the images to lower resolutions for performance boost, that is why defining the maxSize and minSize is important to avoid any unnecessary iterations.
Do I have to capture Video Frames in a certain scale? for example
(100px X100px)
This mainly depends upon the params you pass to the cascade.detectMultiScale. Personally I guess that 100 x 100 would be too small for smaller face detection in the frame as some features would be completely lost while resizing the frame to smaller dimensions, and the cascade.detectMultiScale is highly dependent upon the gradients or features in the input image.
But if the input frame only has face as a major part, and there are no other smaller faces dangling behind then you may use 100 X 100. I have tested some sample faces of size 100 x 100 and it worked pretty well. And if this is not the case then 300 - 400 px width should work good. However you would need to tune the params in order to achieve accuracy.
Do Haar-Cascades work better in Gray-scale Images?
They work only in gray-scale images.
In the article, if you read the first part, you will come to know that it face detection is comprised of detecting many binary patterns in the image, This basically comes from the ViolaJones, paper which is the basic of this algorithm.
Does different lighting conditions make difference?
May be in some cases, largely Haar-features are lighting invariant.
If you are considering different lighting conditions as taking images under green or red light, then it may not affect the detection, The haar-features (since dependent on gray-scale) are independent of the RGB color of input image. The detection mainly depends upon the gradients/features in the input image. So as far as there are enough gradient differences in the input image such as eye-brow has lower intensity than fore-head, etc. it will work fine.
But consider a case when input image has back-light or very low ambient light, In that case it may be possible that some prominent features are not found, which may result in face not detected.
What does the method detectMultiScale(params) exactly do?
I guess, if you have read the article, by this time, then you must be knowing it well.
If I want to go for further analysis for Eye Blinking, Eye Closure
Duration, Mouth Yawning, Head Nodding and Head Orientation to Detect
Fatigue (Drowsiness) By Using Support Vector Machine, any advices?
No, I won't suggest you to perform these type of gesture detection with SVM, as it would be extremely slow to run 10 different cascades to conclude current facial state, However I would recommend you to use some Facial Landmark Detection Framework, such as Dlib, You may search for some other frameworks as well, because the model size of dlib is nearly 100MB and it may not suit your needs i f you want to port it to mobile device. So the key is ** Facial Landmark Detection **, once you get the full face labelled, you can draw conclusions like if the mouth if open or the eyes are blinking, and it works in Real-time, so your video processing won't suffer much.
Related
As far as I know, you can get the screen size and density using Gdx.graphics.getDensity(), so you can load the right texture for E.g 1x, 1.5x etc..
but what about the texture that comes with the 3D model, for E.g. the texture is only intended for a maximum 1280x800px, while my android has dpi 3x.
I don't want to scale it too much because it can cause the image become too blur/fade/not sharp, anyone who knows the solution please?
EDIT:
let me explain in detail
I've one ModelInstance, texture atlas (2048x2048px) attached.
When the games is opened in 4k screen, I widen the scale of the model almost three times, causing the texture to become blurry, that makes sense because from 240dpi to 640dpi the difference is very far.
so in my opinion the solution is to make some textures atlas for 240dpi, 320dpi, 480dpi etc. the problem is I don't know how to replace the texture atlas which from the beginning already integrated with the Model? so when scaling up, texture atlas is automatically replaced with a higher one. thanks
Usually in 3D graphics the camera or model are mobile. There isn't a fixed best resolution of a texture because the camera may be very near, very far away or viewing the textured surface at a glancing angle.
The solution offered by graphics APIs are settings for texture filtering. Under magnification (where a texel takes up more than a screen pixel) you can do linear filtering for soft edges or point filtering for hard edges. Minification is more complex, you can have linear or point filtering, but you can also have mipmaps, which are a precalculated chain of successively half-sized versions of your image, typically all the way down to 1x1. You can set texture filtering to pick the nearest mipmap, blend between mipmaps, or use anisotropic filtering for better sharpness at glancing angles. Generally linear filtering for magnification, and full mipmap chains with anisotropic filtering for minification produces very good quality and good enough performance to be a good default choice.
So, you won't be giving the GPU a single texture for your model, you'll be giving it a chain of textures, and letting the GPU worry about how to sample that chain to give the correct amount of blur/sharpness. For performance and compatibility with mipmaps, it is usually a good idea to use power-of-two textures (e.g. 1024x1024 rather than 1280x800).
So, just make a 1024x1024 or 2048x2048 texture with mipmaps and appropriate filtering settings then use it on every device regardless of its resolution then quality-wise you're sorted.
If you're particularly worried about memory use or load times then there's an argument to reduce the texture size on lower resolution devices (basically, have a second asset with halved resolution for low-res devices, or just skip the highest resolution mip when loading for low-res devices), but I think that might be a premature optimization at this stage.
I have asked this question in this too. But since the topic was a different one, maybe it was not noticed. I got the eigenface algorithm for face recognition working using opencv in java. I wanted to increase the accuracy of the code as its a well known fact that eigenface relies greatly on the light intensity.
What I have Right Now
I get perfect results if I give a check for a image clicked at the same place where the pictures in my database have been clicked, but the results get weird as I give in images clicked in different places.
I figured out that the reason was that my images differ in the light intensity.
Hence , my question is
Is there any way to set a standard to the images saved in the database or the ones that are coming fresh into the system for a recognition check so that I can improve on the accuracy of the face-recognition system that I have currently?
Any kind of positive solution to the problem would be really helpful.
Identifying the lighting intensity and pose is the important factor of face recognition. Try to do histogram comparison with training and testing image (http://docs.opencv.org/doc/tutorials/imgproc/histograms/histogram_comparison/histogram_comparison.html). This parameter helps to avoid the worst lighting situation. And pre processing is one of the successful key factor of Face recognition. Gamma Correction and DOG filtering may reduce the lighting problems.
You can also elliptical filter out only the face,removing the noise created by hair,neck etc.
The OpenCV cookbook provides an excellent and simple tutorial on this.
Below are the following options which may help you boost your accuracy
1] Image Normalization:
Make your image pixel values from 0 to 1 so that to reduce the effect of lighting conditions
2] Image Alignment (This is a very important step to achieve good performance):
Align all the train images and test images so that eyes, nose, mouth of all the faces in all the images have almost the same co-ordinates
Check this post on face alignment (Highly recommended) : https://www.pyimagesearch.com/2017/05/22/face-alignment-with-opencv-and-python/
3] Data augmentation trick:
You can add filters to you faces that will have an effect of the same face in different lighting conditions
So from one face you can make several images in different lighting conditions
4] Removing Noise:
Before performing step 3 apply Gaussian blur to all the images
I am building a 2D top-down tile based game in Java. Naturally you can pan around and zoom in on the game, currently zooming in on 10 different levels, where each tile ranges 10x10 pixels to 100x100 pixels appropriately. Currently, the the tiles for each zoom level are stored in separate sprite sheets, read in at the startup of the program and stored in a buffered image array. I am sure this can't be the best way to go about this.
I am looking for any tips to enhance efficiency for the long-term, would it be better to have the 100x100 tiles only and scale them dynamically in java; somehow use vector graphics in java (I'm sure how, but I'm sure google could help me) or what?
Many thanks!
I'd go dynamic.
Normally in computer graphics you use matrices that, applied to the graphics context, modify everything you draw on it.
This is used to modify position, scale, rotation, etc. Rather than subtract the camera position to every tile, you apply the translation once to the graphics context, and then you draw your tiles in world position. The graphics context will take care of placing the tiles in the correct screen space.
I suggest you the following reads:
http://docs.oracle.com/javase/tutorial/2d/advanced/transforming.html
http://www.javalobby.org/java/forums/t19387.html
If you're doing fixed zooming (i.e. each zoom level is a fixed distance from the camer), as opposed to fluid zooming (the player can zoom in by 3.3x, 7.5x, and not just 1x, 2x, 3x, etc.) then it's massively wasteful to try to solve this by simply applying a zoom transform. It's tempting because that's the least complicated approach, and it's easy to understand from an implementation standpoint, but that means that at maximum zoom-out, you're going to be rendering an area that's 10x larger in the X direction, and 10x larger in the Y direction - so the area of the world that you have to render is 100x larger than at maximum zoom-in. I also doubt that you'll like the way your textures get squished by the hardware as you're zooming out. Computer graphics isn't the same as optics - subpixel rendering, and other things that happen in computer graphics aren't going to make your textures look very good if you hand that task off the the software/hardware.
Even if you do fluid zooming, I would still do level-of-detail textures, and dynamically swap them out depending on the distance between the world being rendered, and the camera.
Also, 10 zoom levels? Are you sure you really need 10 zoom levels? Zoom is usually used in 2D games to allow you to perform different activities at different levels of detail because a particular zoom level is especially well suited for a certain set of activities. I don't remember any 2D game that needed 10 zoom levels to accomplish this. 3-5 is the most I've ever seen, and I've never felt that it wasn't enough. It also seems like a lot of art work to produce the images at every zoom level for 10 zoom levels.
You're also likely going to find that applying an AffineTransform sounds like a good idea, but that it's extremely computationally expensive, and if you need 60fps performance, you're highly unlikely to achieve it this way. Don't take my word for it though, go try it and see how badly it falls over on itself.
Here’s my task which I want to solve with as little effort as possible (preferrably with QT & C++ or Java): I want to use webcam video input to detect if there’s a (or more) crate(s) in front of the camera lens or not. The scene can change from "clear" to "there is a crate in front of the lens" and back while the cam feeds its video signal to my application. For prototype testing/ learning I have 2-3 images of the “empty” scene, and 2-3 images with one or more crates.
Do you know straightforward idea how to tackle this task? I found OpenCV, but isn't this framework too bulky for this simple task? I'm new to the field of computer vision. Is this generally a hard task or is it simple and robust to detect if there's an obstacle in front of the cam in live feeds? Your expert opinion is deeply appreciated!
Here's an approach I've heard of, which may yield some success:
Perform edge detection on your image to translate it into a black and white image, whereby edges are shown as black pixels.
Now create a histogram to record the frequency of black pixels in each vertical column of pixels in the image. The theory here is that a high frequency value in the histogram in or around one bucket is indicative of a vertical edge, which could be the edge of a crate.
You could also consider a second histogram to measure pixels on each row of the image.
Obviously this is a fairly simple approach and is highly dependent on "simple" input; i.e. plain boxes with "hard" edges against a blank background (preferable a background that contrasts heavily with the box).
You dont need a full-blown computer-vision library to detect if there is a crate or no crate in front of the camera. You can just take a snapshot and make a color-histogram (simple). To capture the snapshot take a look here:
http://msdn.microsoft.com/en-us/library/dd742882%28VS.85%29.aspx
Lots of variables here including any possible changes in ambient lighting and any other activity in the field of view. Look at implementing a Canny edge detector (which OpenCV has and also Intel Performance Primitives have as well) to look for the outline of the shape of interest. If you then kinda know where the box will be, you can perhaps sum pixels in the region of interest. If the box can appear anywhere in the field of view, this is more challenging.
This is not something you should start in Java. When I had this kind of problems I would start with Matlab (OpenCV library) or something similar, see if the solution would work there and then port it to Java.
To answer your question I did something similar by XOR-ing the 'reference' image (no crate in your case) with the current image then either work on the histogram (clustered pixels at right means large difference) or just sum the visible pixels and compare them with a threshold. XOR is not really precise but it is fast.
My point is, it took me 2hrs to install Scilab and the toolkits and write a proof of concept. It would have taken me two days in Java and if the first solution didn't work each additional algorithm (already done in Mat-/Scilab) another few hours. IMHO you are approaching the problem from the wrong angle.
If really Java/C++ are just some simple tools that don't matter then drop them and use Scilab or some other Matlab clone - prototyping and fine tuning would be much faster.
There are 2 parts involved in object detection. One is feature extraction, the other is similarity calculation. Some obvious features of the crate are geometry, edge, texture, etc...
So you can find some algorithms to extract these features from your crate image. Then comparing these features with your training sample images.
I am developing a 2D platformer game for the android platform, so I don't really care about the screen DPI, but much more about the actual resolution in pixels. From what I've gathered on the net, there are a couple of different resolutions (and aspect ratios) present. According to my search, the two resolutions that are currently widespread are 480x320 (1.5) and 800x480 (1.666), is that right? I'd like to target these two resolutions to reach most customers.
Now, I can deal with the different aspect ratios by showing a black border of 40 pixel for the bigger display, essentialy reducing it to 720x480 pixel and a ratio of 1.5.
The problem with my game is that it is essential for gameplay that the players see the same amount of the world on each screen. Otherwise, some players would get an unfair advantage. Furthermore, I trigger some events depending on the visibility. For example, an enemy is only allowed to start shooting when the player starts seeing it. Otherwise, the enemies' bullets would seem to come from nowhere.
So I figured I need to either create my graphics for one resolution and scale them for the other. Or I create separate graphics for each resolution. Is that right? Unfortunately, both ways are suboptimal for pixel graphics.
On another note: How can I restrict my game to these resolutions only (especially for the Android Market)? I know about the "supports-screens" tag in the manifest, but that works depending on the effective screen-size, not the size in pixel, or am I mistaken?
I am also interested in personal experiences of other android game developers when it comes to resolution independence.
Thanks!
My question would be: what do you think would do on a PC? For game development, Android should be looked at much more like a PC target than a console. You just intrinsically need to accept that there will be some diversity of screens that you can't totally predict up-front.
So I think there are two main approaches to take:
(1) Use a constant "display size" as if you were setting a fixed video resolution on the PC and letting the user's monitor deal with it. On these devices of course there is no monitor, just one fixed display, so it doesn't make sense to modify the core resolution. Instead, you can set up the SurfaceView showing your game to have a fixed resolution, and let the platform's compositor take care of scaling it (in hardware) as it composites to the screen.
(2) More intelligently adjust to the actual resolution of the screen you find yourself running in. Scale up or down graphics yourself to create the playing area you want. Maybe have some different sizes of textures and select the appropriate ones for the screen resolution.
You could probably also do a combination of these, where you have a couple fixed sizes you pick for the surface view depending on the total resolution available which the game can run well with. In either case, you can do letter-boxing as appropriate to keep your aspect ratio constant on different screens, if that is what you want.
There are three approaches to differences in aspect ratio:
Show opaque borders on some ratios ("letterboxing").
Show more of the game world on some ratios.
Don't work at all on some ratios.
With approach (1) you waste screen space on some devices. Not such a big deal for televisions, but miserable on handheld devices where screen space is limited. With approach (2) players on some devices get advantages (they can see more of the world) and disadvantages (sprites are smaller, so touch precision is harder). Approach (3) just sucks.
Obviously it depends on the details of your game which is better, but as a player I much prefer approach (2). The constituency who care if players on other devices get a bit of a hypothetical advantage is pretty small compared to the constituency who care if their screen is partly obscured by unnecessary black bars.
(Similar approaches and remarks apply to differences in resolution.)