I'm currently working on an image recognition software for the robotics club at my school, and one part really has me stumped: shape recognition. I need to be able to detect the squares in this image before I can try to detect the shapes in the arena.
I've looked up some libraries like JavaCV, but I couldn't really find something that suited my taste. As a reference, here is the image from which I'm trying to determine shapes
Have you tried applying the Hough transform?
That seems to be what you need, as your squares have straight edges.
I was doing something similar to your task but I needed to recognize classes (resistors, capacitors, etc.) of objects and what are their boundaries in a real black&white photo:
Basically, the method was something like this:
Preprocessing - correct contrast, brightness, erosion, dilation, median, etc. - this step can be adaptive to whole/part of the photo.
Segmentation - now find parts of the photo where there could be "something" with some threshold for area, pixel intensity, etc.
Characterize - for every found potential segment calculate some characteristcs - max length, area, W, M - determinants, etc.
Classify - There're several classifiers that checked if the given characteristic can be of this class, and if the answer is yes - what is the "distance" of the given characteristic to the ideal model characteristic. Classification was done using fuzzy logic inference.
And of course - for every successful classification take the best matches if they exist.
In your case - the simplest characterization of a square is to find out its area and the max distance between two points that belongs to found segment. And before it you should preprocess the image with "closing" operation (dilation->erosion)
You could also create nice algorithm to recognize if a square is cut by a line (and remove that line - then recognize again) and check if a square is overlapped by other square, etc.
Personally, I don't know any library that do such complex things as library.recognizeSquaresOnImage(params). You are provided with some useful methods to prepare an image for recognition - the core of your task - you must do by yourself.
Every recognition problem has its own peculiar features that can be used to narrow uncertain results in every step in "recognition pipelie". For example, in my task, I knew that objects are black on a fairly white background, and are more or less separated from each other, etc.
My project was written in C++ using OpenCV library and I was using OpenCV library for only reading/writing image and displaying it in the window - I wasn't allowed to used any other methods of the library.
As a reference how you could do it - HERE is the whole project. Even now it doesn't work perfectly - it needs some calibration of classificators.
To have a better grasp how it works on a higher level - take a look at main.cpp file.
Related
I'm trying to find if a scanned pdf form contains a signature (like making sure a check is signed).
The problem domain:
I will be receiving document packages (multi page pdf's with multiple forms). I have already put together document package classifiers that will check the package for all documents and scale the images to a common size. After that I know where the signatures should be and can scan the area of the document specifically. What I'm looking for is the best approach to making sure there is a signature present. I've considered just checking for a base threshold of dark pixels but that seems so clumsy. The trouble with signatures is that they are not really writing, more of a personal mark.
The only thing I can come up with is a machine learning method to look for loopyness? But I'm not all the familiar with machine learning and don't even know where to start with something like that. Anyone with some suggestions for practical approaches would very appreciated.
I'm coding this in Java if that's helpful at all
What you asked was very broad so there isn't a lot of information that we can give you. However, I can point you to some helpful links:
http://java-ml.sourceforge.net/ --This is a library that you can download that has lots of useful algorithms and other code to include in your program
https://www.youtube.com/playlist?list=PLiaHhY2iBX9hdHaRr6b7XevZtgZRa1PoU --this is a series that explains neural networks (something you might want to look into for your machine learning)
So a big tip I have for your algorithm is to instead of looking for how long exactly all of the loops and things are, look at all of their relative distances
"Relative distances from what?" you say. Well this is where the next tip comes in handy: instead of keeping track of the lines, keep track of the tips of the loops and the order of these points. If you then take the distance between all of them (relatively of course which means to set one of the lengths to zero). Along to keeping track of the distances, you should also keep track of the angles. You would calculate the angle ABC by taking the distance between (A,B), (B,C), and (A,C) (A,B, and C being coordinates on the xy plane) which creates a triangle between the points which allows you to use trigonometry to calculate the angle.
(I am assuming that for all of these you are also trying to detect who's signature it is of course because it actually doesn't really complicate things much at all) When trying to match up the signature detected to the stored signatures to see if they are the "same," don't make it to where the distances and angles have to be exact. Give a margin of error (like use a % range above and below). Here is a tip: Make the margin of error rather large. That way if it is written poorly, it will still be detected. This raises the chances of more than one signature being picked up. Luckily, there is a simply solution to this. Just have it run the algorithm again on the signatures that were found but with the margin of error smaller (you of course don't do this manually, the program does it). Continue decreasing the margin of error until you get only one signature remaining.
I am hoping you have ideas already for detecting where the actual signature is but check for the difference in darkness of the pixels of course. Make sure it is pretty continuous. Also take note of the fact that signatures are commonly signed in both black or blue or sometimes red and other fancy colors.
I am creating my own ray-tracer for fun and learning. One of the features I want to add is the ability to use SVG files as textures directly.
The simple and straight forward way to do this would be to simply render the SVG to another more "lookup-friendly" raster format first and feed that as a regular texture to be used during ray tracing. However I don't want to do that.
Instead I want to actually "trace" the SVG itself directly. So I would like to know are there any SVG libraries for Java that has an API that would lend it self to be used in this manner? It would need some call that takes as input a float point2D[] and returns float colorRGBA[] as an output.
If not what would be the best approach to do this?
I don't know much about Java libraries but most likely they do not suit you too well. The main reasons are:
Most libraries are meant to render pictures and are unsuitable for random look up.
More importantly the SVG texture data does not filter naturally all that well. We know how to build good mipmaps of images and filtering them is easy reducing pressure on your raytracers super sampling need.
Then there is the complexity of SVG itself, something like SVG filters (blur) will be prohibitively expensive to calculate in a random sampling context.
Now if we sidestep option three (3), which is indeed a quite hard problem as it really requires you to do rasterization or something other out of the ordinary. Then there are algorithmic options:
You can actually raytrace the SVG in 2D. This would probably work out well for you as your doing a ray tracer anyway. So all you need to do is shoot rays inside the 2d model and see if your sample point is inside the shape or not. Just shoot a ray to a arbitrary direction and count intersections to see if your inside the shape or not. Simply your ray will intersect the shape a odd number of times if your inside the shape.
Image 1: Intersection testing. (originally posted here) Glancing hits must be excluded (most tracers consider that a miss anyway for this reason even in 3D)
Pairing this tracing with a BSP-Tree or a Quadtree should make this sufficiently performant. All you need is to implement a similar shader support as your standard raytracer and you can handle alpha and gradfients + some of the filters like noise. But sill no luck with blurs without a lot of sampling.
You can also use a texture as a precomputed result for a mipmap and only ask for rendering for a small view box when reaching a mipmap level that does not exist yet using a standard library with a limited window size. This would naturally work better for you and by caching the data you can remove the number of calls. Without the caching it might be too expensive to use. But you can try, (if the system supports clipping your svg). Thsi may not be as easy as it sounds.
You can use your 3d raytracer for this, so instead you shoot rays head on. All you need to do is implement a tracing set logic and you can then triangulate the SVG and use your normal tracing logic to do this. How to describe bezier curves as triangles is described in this nvidia publication. So your changes might be minimal.
Hope this helps even if its not a use this library answer. There is a reason why you do not see this implemented very often.
I want to implement object detection in license plate (the city name) . I have an image:
and I want to detect if the image contains the word "بابل":
I have tried using a template matching method using OpenCV and also using MATLAB but the result is poor when tested with other images.
I have also read this page, but I was not able to get a good understanding of what to do from that.
Can anyone help me or give me a step by step way to solve that?
I have a project to recognize the license plate and we can recognize and detect the numbers but I need to detect and recognize the words (it is the same words with more cars )
Your question is very broad, but I will do my best to explain optical character recognition (OCR) in a programmatic context and give you a general project workflow followed by successful OCR algorithms.
The problem you face is easier than most, because instead of having to recognize/differentiate between different characters, you only have to recognize a single image (assuming this is the only city you want to recognize). You are, however, subject to many of the limitations of any image recognition algorithm (quality, lighting, image variation).
Things you need to do:
1) Image isolation
You'll have to isolate your image from a noisy background:
I think that the best isolation technique would be to first isolate the license plate, and then isolate the specific characters you're looking for. Important things to keep in mind during this step:
Does the license plate always appear in the same place on the car?
Are cars always in the same position when the image is taken?
Is the word you are looking for always in the same spot on the license plate?
The difficulty/implementation of the task depends greatly on the answers to these three questions.
2) Image capture/preprocessing
This is a very important step for your particular implementation. Although possible, it is highly unlikely that your image will look like this:
as your camera would have to be directly in front of the license plate. More likely, your image may look like one of these:
depending on the perspective where the image is taken from. Ideally, all of your images will be taken from the same vantage point and you'll simply be able to apply a single transform so that they all look similar (or not apply one at all). If you have photos taken from different vantage points, you need to manipulate them or else you will be comparing two different images. Also, especially if you are taking images from only one vantage point and decide not to do a transform, make sure that the text your algorithm is looking for is transformed to be from the same point-of-view. If you don't, you'll have an not-so-great success rate that's difficult to debug/figure out.
3) Image optimization
You'll probably want to (a) convert your images to black-and-white and (b) reduce the noise of your images. These two processes are called binarization and despeckling, respectively. There are many implementations of these algorithms available in many different languages, most accessible by a Google search. You can batch process your images using any language /free tool if you want, or find an implementation that works with whatever language you decide to work in.
4) Pattern recognition
If you only want to search for the name of this one city (only one word ever), you'll most likely want to implement a matrix matching strategy. Many people also refer to matrix matching as pattern recognition so you may have heard it in this context before. Here is an excellent paper detailing an algorithmic implementation that should help you immensely should you choose to use matrix matching. The other algorithm available is feature extraction, which attempts to identify words based on patterns within letters (i.e. loops, curves, lines). You might use this if the font style of the word on the license plate ever changes, but if the same font will always be used, I think matrix matching will have the best results.
5) Algorithm training
Depending on the approach you take (if you use a learning algorithm), you may need to train your algorithm with data that is tagged. What this means is that you have a series of images that you've identified as True (contains city name) or False (does not). Here's a psuedocode example of how this works:
train = [(img1, True), (img2, True), (img3, False), (img4, False)]
img_recognizer = algorithm(train)
Then, you apply your trained algorithm to identify untagged images.
test_untagged = [img5, img6, img7]
for image in test_untagged:
img_recognizer(image)
Your training sets should be much larger than four data points; in general, the bigger the better. Just make sure, as I said before, that all the images are of an identical transformation.
Here is a very, very high-level code flow that may be helpful in implementing your algorithm:
img_in = capture_image()
cropped_img = isolate(img_in)
scaled_img = normalize_scale(cropped_img)
img_desp = despeckle(scaled_img)
img_final = binarize(img_desp)
#train
match() = train_match(training_set)
boolCity = match(img_final)
The processes above have been implemented many times and are thoroughly documented in many languages. Below are some implementations in the languages tagged in your question.
Pure Java
cvBlob in OpenCV (check out this tutorial and this blog post too)
tesseract-ocr in C++
Matlab OCR
Good luck!
If you ask "I want to detect if the image contains the word "بابل" - this is classic problem which is solved using http://code.opencv.org/projects/opencv/wiki/FaceDetection like classifier.
But I assume you still want more. Years ago I tried to solve simiar problems and I provide example image to show how good/bad it was:
To detected licence plate I used very basic rectangle detection which is included in every OpenCV samples folder. And then used perspective transform to fix layout and size. It was important to implement multiple checks to see if rectangle looks good enough to be licence plate. For example if rectangle is 500px tall and 2px wide, then probably this is not what I want and was rejected.
Use https://code.google.com/p/cvblob/ to extract arabic text and other components on detected plate. I just had similar need yesterday on other project. I had to extract Japanese kanji symbols from page:
CvBlob does a lot of work for you.
Next step use technique explained http://blog.damiles.com/2008/11/basic-ocr-in-opencv/ to match city name. Just teach algorithm with example images of different city names and soon it will tell 99% of them just out of box. I have used similar approaches on different projects and quite sure they work
Is it possible to analyse an image and determine the position of a car inside it?
If so, how would you approach this problem?
I'm working with a relatively small data-set (50-100) and most images will look similar to the following examples:
I'm mostly interested in only detecting vertical coordinates, not the actual shape of the car. For example, this is the area I want to highlight as my final output:
You could try OpenCV which has an object detection API. But you would need to "train" it...by supplying it with a large set of images that contained "cars".
http://docs.opencv.org/modules/objdetect/doc/objdetect.html
http://robocv.blogspot.co.uk/2012/02/real-time-object-detection-in-opencv.html
http://blog.davidjbarnes.com/2010/04/opencv-haartraining-object-detection.html
Look at the 2nd link above and it shows an example of detecting and creating a bounding box around the object....you could use that as a basis for what you want to do.
http://www.behance.net/gallery/Vehicle-Detection-Tracking-and-Counting/4057777
Various papers:
http://cbcl.mit.edu/publications/theses/thesis-masters-leung.pdf
http://cseweb.ucsd.edu/classes/wi08/cse190-a/reports/scheung.pdf
Various image databases:
http://cogcomp.cs.illinois.edu/Data/Car/
http://homepages.inf.ed.ac.uk/rbf/CVonline/Imagedbase.htm
http://cbcl.mit.edu/software-datasets/CarData.html
1) Your first and second images have two cars in them.
2) If you only have 50-100 images, I can almost guarantee that classifying them all by hand will be faster than writing or adapting an algorithm to recognize cars and deliver coordinates.
3) If you're determined to do this with computer vision, I'd recommend OpenCV. Tutorial here: http://docs.opencv.org/doc/tutorials/tutorials.html
You can use openCV latentSVM detector to detect the car and plot a bounding box around it:
http://docs.opencv.org/modules/objdetect/doc/latent_svm.html
No need to train a new model using HaarCascade, as there is already a trained model for cars:
https://github.com/Itseez/opencv_extra/tree/master/testdata/cv/latentsvmdetector/models_VOC2007
This is a supervised machine learning problem. You will need to use an API that features learning algorithms as colinsmith suggested or do some research and write on of your own. Python is pretty good for machine learning (it's what I use, personally) and has some nice tools like scikit: http://scikit-learn.org/stable/
I'd suggest for you to look into HAAR classifiers. Since you mentioned you have a set of 50-100 images, you can use this to build up a training dataset for the classifier and use it to classify your images.
You can also look into SURF and SIFT algorithms for the specified problem.
In particular, I want to generate a tolerance interval, for which I would need to have the values of Zx for x some value on the standard normal.
Does the Java standard library have anything like this, or should I roll my own?
EDIT: Specifically, I'm looking to do something akin to linear regression on a set of images. I have two images, and I want to see what the degree of correlation is between their pixels. I suppose this might fall under computer vision as well.
Simply calculate Pearson correlation coefficient between those two images.
You will have 3 coefficients because of R,G,B channels needs to be analyzed separately.
Or you can calculate 1 coefficient just for intensity levels of images,... or you could calculate correlation between Hue values of images after converting to HSV or HSL color space.
Do whatever your see fits :-)
EDIT: Correlation coefficient may be maximized only after scaling and/or rotating some image. This may be a problem or not - depends on your needs.
You can use the complete statistical power of R using rJava/JRI. This includes correlations between pixels and so on.
Another option is to look around at imageJ, which contains libraries for many image manipulations, mathematics and statistics. It's an application allright, but the library is useable in development as well. It comes with an extensive developers manual. On a sidenote, imageJ can be combined with R as well.
imageJ allows you to use the correct methods for finding image similarity measures, based on fourier transformations or other methods. More info can be found in Digital Image Processing with Java an ImageJ. See also this paper.
Another one is the Commons-Math. This one also contains the basic statistical tools.
See also the answers on this question and this question.
It seems you want to compare to images to see how similar they are. In this case, the first two things to try are SSD (sum of squared differences) and normalized correlation (this is closely related to what 0x69 suggests, Pearson correlation) between the two images.
You can also try normalized correlation over small (corresponding) windows in the two images and add up the results over several (all) small windows in the image.
These two are very simple methods which you can write in a few minutes.
I'm not sure however what this has to do with hypothesis testing or linear regression, you might want to edit to clarify this part of your question.