I want to implement object detection in license plate (the city name) . I have an image:
and I want to detect if the image contains the word "بابل":
I have tried using a template matching method using OpenCV and also using MATLAB but the result is poor when tested with other images.
I have also read this page, but I was not able to get a good understanding of what to do from that.
Can anyone help me or give me a step by step way to solve that?
I have a project to recognize the license plate and we can recognize and detect the numbers but I need to detect and recognize the words (it is the same words with more cars )
Your question is very broad, but I will do my best to explain optical character recognition (OCR) in a programmatic context and give you a general project workflow followed by successful OCR algorithms.
The problem you face is easier than most, because instead of having to recognize/differentiate between different characters, you only have to recognize a single image (assuming this is the only city you want to recognize). You are, however, subject to many of the limitations of any image recognition algorithm (quality, lighting, image variation).
Things you need to do:
1) Image isolation
You'll have to isolate your image from a noisy background:
I think that the best isolation technique would be to first isolate the license plate, and then isolate the specific characters you're looking for. Important things to keep in mind during this step:
Does the license plate always appear in the same place on the car?
Are cars always in the same position when the image is taken?
Is the word you are looking for always in the same spot on the license plate?
The difficulty/implementation of the task depends greatly on the answers to these three questions.
2) Image capture/preprocessing
This is a very important step for your particular implementation. Although possible, it is highly unlikely that your image will look like this:
as your camera would have to be directly in front of the license plate. More likely, your image may look like one of these:
depending on the perspective where the image is taken from. Ideally, all of your images will be taken from the same vantage point and you'll simply be able to apply a single transform so that they all look similar (or not apply one at all). If you have photos taken from different vantage points, you need to manipulate them or else you will be comparing two different images. Also, especially if you are taking images from only one vantage point and decide not to do a transform, make sure that the text your algorithm is looking for is transformed to be from the same point-of-view. If you don't, you'll have an not-so-great success rate that's difficult to debug/figure out.
3) Image optimization
You'll probably want to (a) convert your images to black-and-white and (b) reduce the noise of your images. These two processes are called binarization and despeckling, respectively. There are many implementations of these algorithms available in many different languages, most accessible by a Google search. You can batch process your images using any language /free tool if you want, or find an implementation that works with whatever language you decide to work in.
4) Pattern recognition
If you only want to search for the name of this one city (only one word ever), you'll most likely want to implement a matrix matching strategy. Many people also refer to matrix matching as pattern recognition so you may have heard it in this context before. Here is an excellent paper detailing an algorithmic implementation that should help you immensely should you choose to use matrix matching. The other algorithm available is feature extraction, which attempts to identify words based on patterns within letters (i.e. loops, curves, lines). You might use this if the font style of the word on the license plate ever changes, but if the same font will always be used, I think matrix matching will have the best results.
5) Algorithm training
Depending on the approach you take (if you use a learning algorithm), you may need to train your algorithm with data that is tagged. What this means is that you have a series of images that you've identified as True (contains city name) or False (does not). Here's a psuedocode example of how this works:
train = [(img1, True), (img2, True), (img3, False), (img4, False)]
img_recognizer = algorithm(train)
Then, you apply your trained algorithm to identify untagged images.
test_untagged = [img5, img6, img7]
for image in test_untagged:
img_recognizer(image)
Your training sets should be much larger than four data points; in general, the bigger the better. Just make sure, as I said before, that all the images are of an identical transformation.
Here is a very, very high-level code flow that may be helpful in implementing your algorithm:
img_in = capture_image()
cropped_img = isolate(img_in)
scaled_img = normalize_scale(cropped_img)
img_desp = despeckle(scaled_img)
img_final = binarize(img_desp)
#train
match() = train_match(training_set)
boolCity = match(img_final)
The processes above have been implemented many times and are thoroughly documented in many languages. Below are some implementations in the languages tagged in your question.
Pure Java
cvBlob in OpenCV (check out this tutorial and this blog post too)
tesseract-ocr in C++
Matlab OCR
Good luck!
If you ask "I want to detect if the image contains the word "بابل" - this is classic problem which is solved using http://code.opencv.org/projects/opencv/wiki/FaceDetection like classifier.
But I assume you still want more. Years ago I tried to solve simiar problems and I provide example image to show how good/bad it was:
To detected licence plate I used very basic rectangle detection which is included in every OpenCV samples folder. And then used perspective transform to fix layout and size. It was important to implement multiple checks to see if rectangle looks good enough to be licence plate. For example if rectangle is 500px tall and 2px wide, then probably this is not what I want and was rejected.
Use https://code.google.com/p/cvblob/ to extract arabic text and other components on detected plate. I just had similar need yesterday on other project. I had to extract Japanese kanji symbols from page:
CvBlob does a lot of work for you.
Next step use technique explained http://blog.damiles.com/2008/11/basic-ocr-in-opencv/ to match city name. Just teach algorithm with example images of different city names and soon it will tell 99% of them just out of box. I have used similar approaches on different projects and quite sure they work
Related
I have an image file, and I need to determine if a specified area of this image contains a signature. Or to put it in end-user terms, "Has this document been signed?"
What I have done so far is to examine all the pixels contained in the area, to calculate an average "darkness", and compare that to a reference value. If the difference in darkness exceeds some threshold, then I consider it signed.
The problem with this (admittedly simplistic) approach is that because the pixels of the signature itself are such a small fraction of area, I have to use a very low darkness threshold, which results in a large number of false positives. I can't distinguish a real signature from stray markings, smudges, fax artifacts, etc.
To be clear...I'm not trying to match any specific signature or set of signatures. That is, I don't care who signed it, only whether it is signed.
Is anyone aware of a Java library that can do this, or of a better approach to this problem than what I am currently doing?
EDIT:
This is an example of the kinds of images I am working with. This document would be faxed to the recipient, signed and faxed back. It won't be this clean-looking by the time I need to look for a signature.
I do not know of any simple solutions. You can wrap over queXF or write something similar in Java. This paper talks about color code algorithm to recognize signatures.
This is what I believe can be done (although not a very good solution) but may still work. It would involve a bit of machine learning. I am assuming that your image does not contain hand written text and its just an image.
First thing to do would be to create a dataset of images which contain a signature and those which do not. The positive samples of the dataset should only contain signatures (you can learn a classifier for multiple aspect ratios) and negative samples should contain random images of the same aspect ratio/dimension. Now, you can compute some feature over these samples (HoG can be used as a feature, although I do not claim it is the best one to use for this application) and learn a SVM for each aspect ratio.
The next step would be to slide a detection window (of different aspect ratios) throughout the image and use the multiple SVMs you have learnt and check if any of them gives a positive response.
Although this approach may not work always, but should give a decent amount of accuracy. The more data you will use to learn, the better the results would get (and if you can come up with a good feature vector to represent a signature, it would help you case even further)
Is it possible to analyse an image and determine the position of a car inside it?
If so, how would you approach this problem?
I'm working with a relatively small data-set (50-100) and most images will look similar to the following examples:
I'm mostly interested in only detecting vertical coordinates, not the actual shape of the car. For example, this is the area I want to highlight as my final output:
You could try OpenCV which has an object detection API. But you would need to "train" it...by supplying it with a large set of images that contained "cars".
http://docs.opencv.org/modules/objdetect/doc/objdetect.html
http://robocv.blogspot.co.uk/2012/02/real-time-object-detection-in-opencv.html
http://blog.davidjbarnes.com/2010/04/opencv-haartraining-object-detection.html
Look at the 2nd link above and it shows an example of detecting and creating a bounding box around the object....you could use that as a basis for what you want to do.
http://www.behance.net/gallery/Vehicle-Detection-Tracking-and-Counting/4057777
Various papers:
http://cbcl.mit.edu/publications/theses/thesis-masters-leung.pdf
http://cseweb.ucsd.edu/classes/wi08/cse190-a/reports/scheung.pdf
Various image databases:
http://cogcomp.cs.illinois.edu/Data/Car/
http://homepages.inf.ed.ac.uk/rbf/CVonline/Imagedbase.htm
http://cbcl.mit.edu/software-datasets/CarData.html
1) Your first and second images have two cars in them.
2) If you only have 50-100 images, I can almost guarantee that classifying them all by hand will be faster than writing or adapting an algorithm to recognize cars and deliver coordinates.
3) If you're determined to do this with computer vision, I'd recommend OpenCV. Tutorial here: http://docs.opencv.org/doc/tutorials/tutorials.html
You can use openCV latentSVM detector to detect the car and plot a bounding box around it:
http://docs.opencv.org/modules/objdetect/doc/latent_svm.html
No need to train a new model using HaarCascade, as there is already a trained model for cars:
https://github.com/Itseez/opencv_extra/tree/master/testdata/cv/latentsvmdetector/models_VOC2007
This is a supervised machine learning problem. You will need to use an API that features learning algorithms as colinsmith suggested or do some research and write on of your own. Python is pretty good for machine learning (it's what I use, personally) and has some nice tools like scikit: http://scikit-learn.org/stable/
I'd suggest for you to look into HAAR classifiers. Since you mentioned you have a set of 50-100 images, you can use this to build up a training dataset for the classifier and use it to classify your images.
You can also look into SURF and SIFT algorithms for the specified problem.
I'm currently working on an image recognition software for the robotics club at my school, and one part really has me stumped: shape recognition. I need to be able to detect the squares in this image before I can try to detect the shapes in the arena.
I've looked up some libraries like JavaCV, but I couldn't really find something that suited my taste. As a reference, here is the image from which I'm trying to determine shapes
Have you tried applying the Hough transform?
That seems to be what you need, as your squares have straight edges.
I was doing something similar to your task but I needed to recognize classes (resistors, capacitors, etc.) of objects and what are their boundaries in a real black&white photo:
Basically, the method was something like this:
Preprocessing - correct contrast, brightness, erosion, dilation, median, etc. - this step can be adaptive to whole/part of the photo.
Segmentation - now find parts of the photo where there could be "something" with some threshold for area, pixel intensity, etc.
Characterize - for every found potential segment calculate some characteristcs - max length, area, W, M - determinants, etc.
Classify - There're several classifiers that checked if the given characteristic can be of this class, and if the answer is yes - what is the "distance" of the given characteristic to the ideal model characteristic. Classification was done using fuzzy logic inference.
And of course - for every successful classification take the best matches if they exist.
In your case - the simplest characterization of a square is to find out its area and the max distance between two points that belongs to found segment. And before it you should preprocess the image with "closing" operation (dilation->erosion)
You could also create nice algorithm to recognize if a square is cut by a line (and remove that line - then recognize again) and check if a square is overlapped by other square, etc.
Personally, I don't know any library that do such complex things as library.recognizeSquaresOnImage(params). You are provided with some useful methods to prepare an image for recognition - the core of your task - you must do by yourself.
Every recognition problem has its own peculiar features that can be used to narrow uncertain results in every step in "recognition pipelie". For example, in my task, I knew that objects are black on a fairly white background, and are more or less separated from each other, etc.
My project was written in C++ using OpenCV library and I was using OpenCV library for only reading/writing image and displaying it in the window - I wasn't allowed to used any other methods of the library.
As a reference how you could do it - HERE is the whole project. Even now it doesn't work perfectly - it needs some calibration of classificators.
To have a better grasp how it works on a higher level - take a look at main.cpp file.
For the last week I've been researching and experimenting with facial recognition. The intended application is for a person to be able to look up a person's information in a database (SQL) by simply taking a picture of their face. The initial expectation was to be able to compress a face down to a key or hash and use this as the database lokup. This need not be extremely accurate as the person looking up the information can and most likely will end up doing a final comparison between the original image on file and the person standing in front of them.
OpenCV/JavaCV seems to be the obvious starting point, and the facial detection that it provides works well, however the implementation of Eigenfaces for facial recognition isn't ideal because online training by recompiling hundreds of thousands of user faces every time a new face needs to be added to the training set wouldn't work.
I am experimenting with using SURF descriptors on a face extracted using OpenCV's Haar Cascade features, and this appears to get me closer to the intended result, however I am unable to think of a way to efficiently lookup and compare roughly 30 descriptors (which are either 64 or 128 dimensional vectors) in a database. I've done some reading about LSH and Spectral Hashing algorithms, however there are no implementations to be found for Java and my math isn't strong enough to implement them myself.
Does anyone have any thoughts or ideas on how this might be accomplished, or if it is even possible?
Hashing isn't complicated, nor do you need a degree in maths.
Assuming that any 2 images will result in a fairly similar number of 'descriptors' then it only requires that you get a reasonable match with enough of them to get to a high enough confidence factor.
How specific these descriptors are determines what level of collision you can accept in your hashing algorithm.
As you have several of them, I would suggest that you don't need anything too sophisticated - after all, you probably want a level of 'fuzziness' in your search?
Start with something simple - experiment and refine. You might even find that you'll need different hashing for different descriptors - i.e. some might be more specific than others?
Hopefully some food for thought.
I'm looking for several methods to compare two images to see how similar they are. Currently I plan to have percentages as the 'similarity index' end-result. My program outline is something like this:
User selects 2 images to compare.
With a button, the images are compared using several different methods.
At the end, each method will have a percentage next to it indicating how similar the images are based on that method.
I've done a lot of reading lately and some of the stuff I've read seems to be incredibly complex and advanced and not for someone like me with only about a year's worth of Java experience. So far I've read about:
The Fourier Transform - im finding this rather confusing to implement in Java, but apparently the Java Advanced Imaging API has a class for it. Though I'm not sure how to convert the output to an actual result
SIFT algorithm - seems incredibly complex
Histograms - probably the easiest out of all mentioned so far
Pixel grabbing - seems viable but if theres a considerable amount of variation between the two images it doesn't look like it's going to produce any sort of accurate result. I might be wrong?
I also have the idea of pre-processing an image using a Sobel filter first, then comparing it. Problem is the actual comparing part.
So yeah I'm looking to see if anyone has ideas for comparing images in Java. Hoping that there are people here that have done similar projects before. I just want to get some input on viable comparison techniques that arent too hard to implement in Java.
Thanks in advance
Fourier Transform - This can be used to efficiently can compute the cross-correlation, which will tell you how to align the two images and how similar they are, when they are optimally aligned.
Sift descriptors - These can be used to compare local features. They are often used for correspondence analysis and object recognition. (See also SURF)
Histograms - The normalized cross-correlation often yields good results for comparing images on a global level. But since you are just comparing color distributions you could end up declaring an outdoor scene with lots of snow as similar to an indoor scene with lots of white wallpaper...
Pixel grabbing - No idea what this is...
You can get a good overview from this paper. Another field you might to look into is content based image retrieval (CBIR).
Sorry for not being Java specific. HTH.
As a better alternative to simple pixel grabbing, try SSIM. It does require that your images are essentially of the same object from the same angle, however. It's useful if you're comparing images that have been compressed with different algorithms, for example (e.g. JPEG vs JPEG2000). Also, it's a fairly simple approach that you should be able to implement reasonably quickly to see some results.
I don't know of a Java implementation, but there's a C++ implementation using OpenCV. You could try to re-use that (through something like javacv) or just write it from scratch. The algorithm itself isn't that complicated anyway, so you should be able to implement it directly.