I'm trying to detect players in soccer game with javacv using HOG Descriptor. I already implemented the method with the default people detector, but, the results are not satisfying. So, I extracted positive and negative images and I want to extract features using this images.
Do anyone have any ideas on how to do this please? Thanks!
You are actually implementing the idea published in this paper.
An (extended) sample code can be found at UCI
To summarize:
You have to generate a positive and a negative training set. This means in the positive training images you have to know where the players are located.
Then you have to extract the HoG features at the players position. Note the original HoG method takes input patches of size 128x64, so ensure that your players are all scaled to the same size. And important: HoG feature size depends on the extraction window size, so keep it fixed!
Store the information in a data structure with corresponding label 1.
Then extract negative features from negative images and store them with corresponding label 0 or -1.
Use some training method. I currently work with a linear support vector machine similar to liblinear: SVM
Then use the test set to ensure you are getting correct results. For testing use a sliding window and slide it all over the image and score the extracted features. Take the best score, as it is most likely, that the player is located there.
If you want to detect several players in one image use non maximum suppression.
Note: HoG features are quite difficult to handle, as small changes in extraction might have a great impact on performance. For example openCV ships with an (undocumented) HoG detector. HoG visualization helped me to understand how it works.
EDIT: fixed HoG visualization link
Related
I've been playing a bit with some image processing techniques to do HDR pictures and similar. I find it very hard to align pictures taken in bursts... I tried some naïve motion search algorithms, simply based on comparing small samples of pixels (like 16x16) between different pictures that pretty much work like:
- select one 16x6 block in the first picture, one with high contrast, then blur it, to reduce noise
- compare in a neighbouring radius (also blurred for noise)... (usually using averaged squared difference)
- select the most similar one.
I tried a few things to improve this, for example using these search alghorithms (https://en.wikipedia.org/wiki/Block-matching_algorithm) to speed it up. The results however are not good and when they are they are not robust. Also they keep being computationally very intensive (which precludes usage on a mobile device for example).
I looked into popular research based algorithms like https://en.wikipedia.org/wiki/Lucas%E2%80%93Kanade_method, but it does not seem very suitable to big movements. If we see burst images taken with todays phones, that have sensors > 12Mpix, it's easy that small movements result in a difference of 50-100 pixels. The Lucas Kanade method seems more suitable for small amounts of motion...
It's a bit frustrating as there seem to be hundreds of apps that do HDR, and they seems to be able to match pictures so easily and reliably in a snap... I've tried to look into OpenCV, but all it offers seems to be the above Lucas Kanade method. Also I've seen projects like https://github.com/almalence/OpenCamera, which do this in pure java easily. Although the code is not easy (one class has 5k lines doing it all). Does anyone have any pointers to reliable resources.
Take a look at HDR+ paper by google. It uses a hierarchical algorithm for alignment that is very fast but not robust enough. Afterward it uses a merging algorithm that is robust to alignment failures.
But it may be a little tricky to use it for normal HDR, since it says:
We capture frames of constant exposure, which makes alignment more robust.
Here is another work that needs sub-pixel accurate alignment. It uses a refined version of the alignment introduced in the HDR+ paper.
HDR+ code
I want to implement object detection in license plate (the city name) . I have an image:
and I want to detect if the image contains the word "بابل":
I have tried using a template matching method using OpenCV and also using MATLAB but the result is poor when tested with other images.
I have also read this page, but I was not able to get a good understanding of what to do from that.
Can anyone help me or give me a step by step way to solve that?
I have a project to recognize the license plate and we can recognize and detect the numbers but I need to detect and recognize the words (it is the same words with more cars )
Your question is very broad, but I will do my best to explain optical character recognition (OCR) in a programmatic context and give you a general project workflow followed by successful OCR algorithms.
The problem you face is easier than most, because instead of having to recognize/differentiate between different characters, you only have to recognize a single image (assuming this is the only city you want to recognize). You are, however, subject to many of the limitations of any image recognition algorithm (quality, lighting, image variation).
Things you need to do:
1) Image isolation
You'll have to isolate your image from a noisy background:
I think that the best isolation technique would be to first isolate the license plate, and then isolate the specific characters you're looking for. Important things to keep in mind during this step:
Does the license plate always appear in the same place on the car?
Are cars always in the same position when the image is taken?
Is the word you are looking for always in the same spot on the license plate?
The difficulty/implementation of the task depends greatly on the answers to these three questions.
2) Image capture/preprocessing
This is a very important step for your particular implementation. Although possible, it is highly unlikely that your image will look like this:
as your camera would have to be directly in front of the license plate. More likely, your image may look like one of these:
depending on the perspective where the image is taken from. Ideally, all of your images will be taken from the same vantage point and you'll simply be able to apply a single transform so that they all look similar (or not apply one at all). If you have photos taken from different vantage points, you need to manipulate them or else you will be comparing two different images. Also, especially if you are taking images from only one vantage point and decide not to do a transform, make sure that the text your algorithm is looking for is transformed to be from the same point-of-view. If you don't, you'll have an not-so-great success rate that's difficult to debug/figure out.
3) Image optimization
You'll probably want to (a) convert your images to black-and-white and (b) reduce the noise of your images. These two processes are called binarization and despeckling, respectively. There are many implementations of these algorithms available in many different languages, most accessible by a Google search. You can batch process your images using any language /free tool if you want, or find an implementation that works with whatever language you decide to work in.
4) Pattern recognition
If you only want to search for the name of this one city (only one word ever), you'll most likely want to implement a matrix matching strategy. Many people also refer to matrix matching as pattern recognition so you may have heard it in this context before. Here is an excellent paper detailing an algorithmic implementation that should help you immensely should you choose to use matrix matching. The other algorithm available is feature extraction, which attempts to identify words based on patterns within letters (i.e. loops, curves, lines). You might use this if the font style of the word on the license plate ever changes, but if the same font will always be used, I think matrix matching will have the best results.
5) Algorithm training
Depending on the approach you take (if you use a learning algorithm), you may need to train your algorithm with data that is tagged. What this means is that you have a series of images that you've identified as True (contains city name) or False (does not). Here's a psuedocode example of how this works:
train = [(img1, True), (img2, True), (img3, False), (img4, False)]
img_recognizer = algorithm(train)
Then, you apply your trained algorithm to identify untagged images.
test_untagged = [img5, img6, img7]
for image in test_untagged:
img_recognizer(image)
Your training sets should be much larger than four data points; in general, the bigger the better. Just make sure, as I said before, that all the images are of an identical transformation.
Here is a very, very high-level code flow that may be helpful in implementing your algorithm:
img_in = capture_image()
cropped_img = isolate(img_in)
scaled_img = normalize_scale(cropped_img)
img_desp = despeckle(scaled_img)
img_final = binarize(img_desp)
#train
match() = train_match(training_set)
boolCity = match(img_final)
The processes above have been implemented many times and are thoroughly documented in many languages. Below are some implementations in the languages tagged in your question.
Pure Java
cvBlob in OpenCV (check out this tutorial and this blog post too)
tesseract-ocr in C++
Matlab OCR
Good luck!
If you ask "I want to detect if the image contains the word "بابل" - this is classic problem which is solved using http://code.opencv.org/projects/opencv/wiki/FaceDetection like classifier.
But I assume you still want more. Years ago I tried to solve simiar problems and I provide example image to show how good/bad it was:
To detected licence plate I used very basic rectangle detection which is included in every OpenCV samples folder. And then used perspective transform to fix layout and size. It was important to implement multiple checks to see if rectangle looks good enough to be licence plate. For example if rectangle is 500px tall and 2px wide, then probably this is not what I want and was rejected.
Use https://code.google.com/p/cvblob/ to extract arabic text and other components on detected plate. I just had similar need yesterday on other project. I had to extract Japanese kanji symbols from page:
CvBlob does a lot of work for you.
Next step use technique explained http://blog.damiles.com/2008/11/basic-ocr-in-opencv/ to match city name. Just teach algorithm with example images of different city names and soon it will tell 99% of them just out of box. I have used similar approaches on different projects and quite sure they work
Is it possible to analyse an image and determine the position of a car inside it?
If so, how would you approach this problem?
I'm working with a relatively small data-set (50-100) and most images will look similar to the following examples:
I'm mostly interested in only detecting vertical coordinates, not the actual shape of the car. For example, this is the area I want to highlight as my final output:
You could try OpenCV which has an object detection API. But you would need to "train" it...by supplying it with a large set of images that contained "cars".
http://docs.opencv.org/modules/objdetect/doc/objdetect.html
http://robocv.blogspot.co.uk/2012/02/real-time-object-detection-in-opencv.html
http://blog.davidjbarnes.com/2010/04/opencv-haartraining-object-detection.html
Look at the 2nd link above and it shows an example of detecting and creating a bounding box around the object....you could use that as a basis for what you want to do.
http://www.behance.net/gallery/Vehicle-Detection-Tracking-and-Counting/4057777
Various papers:
http://cbcl.mit.edu/publications/theses/thesis-masters-leung.pdf
http://cseweb.ucsd.edu/classes/wi08/cse190-a/reports/scheung.pdf
Various image databases:
http://cogcomp.cs.illinois.edu/Data/Car/
http://homepages.inf.ed.ac.uk/rbf/CVonline/Imagedbase.htm
http://cbcl.mit.edu/software-datasets/CarData.html
1) Your first and second images have two cars in them.
2) If you only have 50-100 images, I can almost guarantee that classifying them all by hand will be faster than writing or adapting an algorithm to recognize cars and deliver coordinates.
3) If you're determined to do this with computer vision, I'd recommend OpenCV. Tutorial here: http://docs.opencv.org/doc/tutorials/tutorials.html
You can use openCV latentSVM detector to detect the car and plot a bounding box around it:
http://docs.opencv.org/modules/objdetect/doc/latent_svm.html
No need to train a new model using HaarCascade, as there is already a trained model for cars:
https://github.com/Itseez/opencv_extra/tree/master/testdata/cv/latentsvmdetector/models_VOC2007
This is a supervised machine learning problem. You will need to use an API that features learning algorithms as colinsmith suggested or do some research and write on of your own. Python is pretty good for machine learning (it's what I use, personally) and has some nice tools like scikit: http://scikit-learn.org/stable/
I'd suggest for you to look into HAAR classifiers. Since you mentioned you have a set of 50-100 images, you can use this to build up a training dataset for the classifier and use it to classify your images.
You can also look into SURF and SIFT algorithms for the specified problem.
I'm currently working on an image recognition software for the robotics club at my school, and one part really has me stumped: shape recognition. I need to be able to detect the squares in this image before I can try to detect the shapes in the arena.
I've looked up some libraries like JavaCV, but I couldn't really find something that suited my taste. As a reference, here is the image from which I'm trying to determine shapes
Have you tried applying the Hough transform?
That seems to be what you need, as your squares have straight edges.
I was doing something similar to your task but I needed to recognize classes (resistors, capacitors, etc.) of objects and what are their boundaries in a real black&white photo:
Basically, the method was something like this:
Preprocessing - correct contrast, brightness, erosion, dilation, median, etc. - this step can be adaptive to whole/part of the photo.
Segmentation - now find parts of the photo where there could be "something" with some threshold for area, pixel intensity, etc.
Characterize - for every found potential segment calculate some characteristcs - max length, area, W, M - determinants, etc.
Classify - There're several classifiers that checked if the given characteristic can be of this class, and if the answer is yes - what is the "distance" of the given characteristic to the ideal model characteristic. Classification was done using fuzzy logic inference.
And of course - for every successful classification take the best matches if they exist.
In your case - the simplest characterization of a square is to find out its area and the max distance between two points that belongs to found segment. And before it you should preprocess the image with "closing" operation (dilation->erosion)
You could also create nice algorithm to recognize if a square is cut by a line (and remove that line - then recognize again) and check if a square is overlapped by other square, etc.
Personally, I don't know any library that do such complex things as library.recognizeSquaresOnImage(params). You are provided with some useful methods to prepare an image for recognition - the core of your task - you must do by yourself.
Every recognition problem has its own peculiar features that can be used to narrow uncertain results in every step in "recognition pipelie". For example, in my task, I knew that objects are black on a fairly white background, and are more or less separated from each other, etc.
My project was written in C++ using OpenCV library and I was using OpenCV library for only reading/writing image and displaying it in the window - I wasn't allowed to used any other methods of the library.
As a reference how you could do it - HERE is the whole project. Even now it doesn't work perfectly - it needs some calibration of classificators.
To have a better grasp how it works on a higher level - take a look at main.cpp file.
I've been doing a lot of searching and i could not find any solution.
I've an image ( assume it's 400*400) and i have a small piece of it (133*133) i want to locate where is the starting point (x,y) of the small image piece in the large image.
in another word i want to be able to know where the small image is located inside the big image.
any suggestions how to implement it using java without using external libraries?
The simplest way is to iterate through all possible starting points and calculate the (sum of squared) differences between the template and the image.
This is horribly inefficient, though. You should look into filtering in the frequency domain and implement your own Fast Fourier Transform (even with a home made FFT algorithm processing should be far faster than calculating differences at each point).