Choosing right features for OCR of digits - java

I have been working on doing some OCR for digits I take pictures of. Below is an example image:
I am implementing this application in Java with OpenCV. My first approach was to calculate the pixel values after resizing to 20x20 and use the Knn-algorithm to find the best match. This worked well sometimes, but I still had a bad success rate. I have been wondering if dividing each image into four quadrants and doing pixel calculation for the quadrants might improve the results.
I have tried to calculate hu invariant moments for each image. But it seems to me that these values are too similar, and not good enough for distinguishing each picture. It may be that I have been doing something wrong, but I have tried to calculate these moments from the contour as well as the image itself.
I was hoping someone was able to think of a good feature to use by looking at my picture. I have tried tesseract, but with not as good results I was hoping to get.

Related

Edge detection on monochrome pictures - Java

I've got a ridiculously insane Linear Algebra professor at uni who asked us this last Friday to develop a programme in Java that loads a monochrome picture and then applies an edge-detecting filter on it.
The problem is nobody in my class has got the slightest clue how to do it and I have only a week to get it done.
As I'm still trying to get my head round it and start it from scratch, does anybody have anything ready to send me so I can study it and save my semester?
Any efforts will be much appreciated.
Here's a very basic approach you might go with:
1) What is an edge in a monochrome image? One could say that it is a steep intensity gradient. If you go from black to white that is an edge, and vice versa.
2) A very simple filter operation that builds on this idea is the Sobel operator. Read up on it here: Wikipedia.
3) You'll stumble across 2 terms that may be unfamiliar to you: Kernel and Convolution. A kernel is basically a window moved over each pixel, performing an operation on the pixel's environment. In case of the Sobel 3x3 kernel, you assign a new value to the filtered image based on the pixel's direct neighbours. The convolution operation can be thought of as - among other things - an operation that moves the kernel across every pixel in the image (note: This is a gross oversimplification to get you started and technically incorrect. It should, however, give you the right idea)
4) Now the simplest way of applying a Sobel kernel to a BufferedImage is by using the ConvolveOp class. It is a prebuilt java class that takes a kernel, applies it to a given image and returns the filtered image. However, if this is for class, you might want to implement this yourself.

Image processing - OpenCV, Identifying digits

I am new to image processing and to opencv in particular.
I am working on an OCR project in which i need to identify numbers.
This is my image to process:
Lets say i already optimized the image, my questions are:
In the image the number are always apeared several times, lets say i found the contours, so how can i know which one if the the best one to process?
How can I know in what angle I need to rotate each contour to make It stright?
In the image the number are always apeared several times, lets say i found the contours, so how can i know which one if the the best one to process?
You want always the biggest number, because they are least warped by perspective. So you always want the numbers in the middle of the image, because they are also n the middle of the ball.
How can I know in what angle I need to rotate each contour to make It stright?
Have a look at rotated rect. I explained how to find the angle in this thread.
Since you always have a perfectly centered ball, you should think about using mapping to "unwarp" your ball (so do a projection like from the globe onto a map). It should be pretty straightforward afterwards to find the numbers on the flat image.
Edit: Since you only have 10 numbers you might also "brute force" the solution with a big enough training set. So just throw all numbers you detect into a classifier and keep the most likely solution.
1) I agree with #Sebastian in the first part. Exploit the fact that in your scenario the numbers are placed in the surface of a ball, so first select the blobs inside a centered region of interest.
2) The contours shown in the image are not rotated (the numbers are). Instead of "rotating" these bounding boxes, which seems to be quite a headache, I'd rather use them combined with rotation invariant keypoints. I'll clarify this:
a) You know where your numbers are, so you don't have to search in the entire image. OK, keep these already selected regions in mind.
b) You can take "straight" samples of the numbers 0-9 and use them as ground truth.
c) You can perform a matching search between each "ground truth" image and each candidate region. Now, forget the scale/rotation: use scale/rotation invariant keypoints! Something like this:
Again, notice that you have already selected the region-of-interest, so in your case the search will consist on checking the number of matches (number of blue lines) between each of the registered numbers and your candidate. I think it worth a try! :)
You can find more info on the different keypoints available in opencv here.
Hope that it helps!

How to Increase accuracy of Eigenface Algorithm?

I have asked this question in this too. But since the topic was a different one, maybe it was not noticed. I got the eigenface algorithm for face recognition working using opencv in java. I wanted to increase the accuracy of the code as its a well known fact that eigenface relies greatly on the light intensity.
What I have Right Now
I get perfect results if I give a check for a image clicked at the same place where the pictures in my database have been clicked, but the results get weird as I give in images clicked in different places.
I figured out that the reason was that my images differ in the light intensity.
Hence , my question is
Is there any way to set a standard to the images saved in the database or the ones that are coming fresh into the system for a recognition check so that I can improve on the accuracy of the face-recognition system that I have currently?
Any kind of positive solution to the problem would be really helpful.
Identifying the lighting intensity and pose is the important factor of face recognition. Try to do histogram comparison with training and testing image (http://docs.opencv.org/doc/tutorials/imgproc/histograms/histogram_comparison/histogram_comparison.html). This parameter helps to avoid the worst lighting situation. And pre processing is one of the successful key factor of Face recognition. Gamma Correction and DOG filtering may reduce the lighting problems.
You can also elliptical filter out only the face,removing the noise created by hair,neck etc.
The OpenCV cookbook provides an excellent and simple tutorial on this.
Below are the following options which may help you boost your accuracy
1] Image Normalization:
Make your image pixel values from 0 to 1 so that to reduce the effect of lighting conditions
2] Image Alignment (This is a very important step to achieve good performance):
Align all the train images and test images so that eyes, nose, mouth of all the faces in all the images have almost the same co-ordinates
Check this post on face alignment (Highly recommended) : https://www.pyimagesearch.com/2017/05/22/face-alignment-with-opencv-and-python/
3] Data augmentation trick:
You can add filters to you faces that will have an effect of the same face in different lighting conditions
So from one face you can make several images in different lighting conditions
4] Removing Noise:
Before performing step 3 apply Gaussian blur to all the images

FFT Image to Measure Similarities

Ok I'm writing a small Java app that accepts two images as inputs, compares them, then gives a quantitative output as a measure of similarity (eg. 50% similar).
To my understanding FFT is a good way to measure similarity of two images. But I can't for the love of god figure out how to code/implement it.
So far I've implemented another function which basically gives me two histograms (one for each image). All I need now is to write a method that will FFT an image and give me a quantifiable outcome.
Can anyone help me out with this? I'd really like to see some sample codes, if not at least a point in the right direction. Much thanks in advance.
Similarity is not an exact term. For example: if you have circle, and an ellipse are they similar? They are both round objects, so in this sense they are - but if we want to filter out circles only they are not. You will have to define a measure (or measures - for example roundness, intensity distribution, size, orientation, number of objects, euler number, etc.), than calculate it for each image. The similarity of the two images will be (some kind of) distance between the two calculated values. This could be euclidean distance (for two real measures), or some kind of error function (RMS for intensity distributions).
You will have to choose to which transforms should your measure stay invariant (is the rotated image similar to the original? If yes, simple fourier transform is not appropriate).
Measuring similarity of an image is hard, if you have to do that I would read about image stitching. If you just need to distinguish BLOB-s, first try to calculate some simple measures (I recommend calculating moments - area, orientation; read K-means clusteing), or 1D fourier transform of the distance of the contour from the center of the mass (whic is a little bit more difficult).
Before you attempt to code up a 2DFT, you should fully understand the math behind it. flolo is correct that you can compute it by first doing a 1D FFT on the rows and columns and then combining the results, but I have no reason to believe the L_inf norm is the best way to convert them to a metric, since it completely skips the usual combining step to create the full 2DFT. Take a look at http://fourier.eng.hmc.edu/e101/lectures/Image_Processing/node6.html at the very bottom of the page.
That said, there may be better ways to compare images that don't require comparing 2D arrays of information. For instance, PCA (Principal Component Analysis, which is just a matter of running SVD {Singular Value Decomposition} on your images after mean-centering them, though I'd take a look at the wikipedia article on it first) will give you a 1D vector which you could then apply some L_p norm to directly to compare, although in this case, i would use something like sum(min(a_i/b_i , b_i/a_i))/length(a), where a and b are the 1D vectors you got from the transform.
There are many good sites with code for a fft on an 1-D array of values. You just apply this fft row by row on your image. And afterwards you do fft columnwise on the results.
Now you need a metric to get from the resulting transformed image, my suggestion would be to try the max-norm (L_inf). That is max_{x,y}{fft2d(imag1)[x,y] - fft2d(imag2)[x,y]}.
If you just want to check if it is likely that one image is a quick edit of another for something like DRM of stock photography then check the percentages of a normalized color palette within probable regions. If they match within an THRESHOLD for a NUMBER_OF_TEST_COLORS in any one of a number of TEST_REGIONS within the image then you have a "suspect"... you still need a human to check the suspects. But this is a quick and dirty way to find many of the image re-sizers, horiz/vert flippers, and background color changers, file format changers, and other subtle variations... of course "normalizing the colors" to a quantized palette is an art unto itself. I would recommend quantizing images into nearest "web safe" colors for practicality.
I'm a blue collar garbage man in comparison to a mathematician, but garbage men are quite practical! I have had good success with this kind of approach in grouping similar images and search by color applications.

Object detection with a generic webcam

Here’s my task which I want to solve with as little effort as possible (preferrably with QT & C++ or Java): I want to use webcam video input to detect if there’s a (or more) crate(s) in front of the camera lens or not. The scene can change from "clear" to "there is a crate in front of the lens" and back while the cam feeds its video signal to my application. For prototype testing/ learning I have 2-3 images of the “empty” scene, and 2-3 images with one or more crates.
Do you know straightforward idea how to tackle this task? I found OpenCV, but isn't this framework too bulky for this simple task? I'm new to the field of computer vision. Is this generally a hard task or is it simple and robust to detect if there's an obstacle in front of the cam in live feeds? Your expert opinion is deeply appreciated!
Here's an approach I've heard of, which may yield some success:
Perform edge detection on your image to translate it into a black and white image, whereby edges are shown as black pixels.
Now create a histogram to record the frequency of black pixels in each vertical column of pixels in the image. The theory here is that a high frequency value in the histogram in or around one bucket is indicative of a vertical edge, which could be the edge of a crate.
You could also consider a second histogram to measure pixels on each row of the image.
Obviously this is a fairly simple approach and is highly dependent on "simple" input; i.e. plain boxes with "hard" edges against a blank background (preferable a background that contrasts heavily with the box).
You dont need a full-blown computer-vision library to detect if there is a crate or no crate in front of the camera. You can just take a snapshot and make a color-histogram (simple). To capture the snapshot take a look here:
http://msdn.microsoft.com/en-us/library/dd742882%28VS.85%29.aspx
Lots of variables here including any possible changes in ambient lighting and any other activity in the field of view. Look at implementing a Canny edge detector (which OpenCV has and also Intel Performance Primitives have as well) to look for the outline of the shape of interest. If you then kinda know where the box will be, you can perhaps sum pixels in the region of interest. If the box can appear anywhere in the field of view, this is more challenging.
This is not something you should start in Java. When I had this kind of problems I would start with Matlab (OpenCV library) or something similar, see if the solution would work there and then port it to Java.
To answer your question I did something similar by XOR-ing the 'reference' image (no crate in your case) with the current image then either work on the histogram (clustered pixels at right means large difference) or just sum the visible pixels and compare them with a threshold. XOR is not really precise but it is fast.
My point is, it took me 2hrs to install Scilab and the toolkits and write a proof of concept. It would have taken me two days in Java and if the first solution didn't work each additional algorithm (already done in Mat-/Scilab) another few hours. IMHO you are approaching the problem from the wrong angle.
If really Java/C++ are just some simple tools that don't matter then drop them and use Scilab or some other Matlab clone - prototyping and fine tuning would be much faster.
There are 2 parts involved in object detection. One is feature extraction, the other is similarity calculation. Some obvious features of the crate are geometry, edge, texture, etc...
So you can find some algorithms to extract these features from your crate image. Then comparing these features with your training sample images.

Categories

Resources