First question here so sorry if anything I ask is completely stupid.
I'm working in a shape recognition project where it is supposed for me to develop an application that receives two images: an original one and a sketch made by a user. I am supposed to detect contours of the two images and find the best match in the original image corresponding to the sketch made by the user.
I am already learning some basics about the Canny edge detection and was able to get the contours of several images. After having the contours, I need to analyze all contours in the image and find the best match, disregarding translation, rotation, scaling and occlusion.
Then, I found this code that does exactly what I want:
http://www.morethantechnical.com/2012/12/27/2d-curve-matching-in-opencv-w-code/ but is in C++.
Do you know any alternative for similar code in Java or any algorithm that could be useful to me? I also discovered BoofCV but it seems that such task is not implemented.
Thank you for your patience.
EDIT:
I've been searching for other ways of doing this, and I found the Hausdorff distance:
http://cgm.cs.mcgill.ca/~godfried/teaching/cg-projects/98/normand/main.html
Is it possible to modify this algorithm to be rotation invariant? They only talk about translation and scaling.
As you mentioned you already have the source code in C++ and can't find a Java version - possibly your best bet might be to convert the C++ code into Java code. If you don't need all parts of the C++ program, you might want to covert only the parts (classes) that you need.
Conversion from C++ to Java might not be always trivial but I am guessing it might be easier if you know exactly what you want and how you want your program to behave. Below is a link to some conversion tools - although they might not be free.
http://www.researchgate.net/post/How_to_convert_the_C_C_code_to_java2
Related
What is the best method to track/recognize an object using a Kinect and Java or C programming after having a constant track on the object in 3D space I wanted to have the coordinates.
I know the exact object I wanna tack and wanted to the most convenient way to track the object.
I've currently programming with processing using Java, I'm a newbie to this any help would be appreciated.
Stack Overflow isn't really designed for general "how do I do this" type questions. It's designed for more specific "I tried X, expected Y, but got Z instead" type questions. That being said, I'll try to help in a general sense:
Break your problem down into smaller pieces.
Step 1: Can you get Kinect data feeding into your code? Don't worry about doing anything with the data, just display it on the screen for now. Googling something like "Processing Kinect" returns a ton of results, or you could check out the Processing libraries page.
Step 2: After you get that working, then can you identify your target point? Then can you track that point? Again, google is your friend. You might also consider treating this as a separate problem and using something like OpenCV to do image processing on the Kinect feed.
Open Kinect by Daniel Shiffman is a pretty good starting point, and it contains a bunch of examples that get you closer to your goal.
That should be a reasonable starting point: break your problem down into smaller steps, then use google searches to approach those steps one at a time. If you get stuck on a specific step, come back and ask a specific question (don't forget the MCVE) and we'll go from there. Good luck.
I want to implement object detection in license plate (the city name) . I have an image:
and I want to detect if the image contains the word "بابل":
I have tried using a template matching method using OpenCV and also using MATLAB but the result is poor when tested with other images.
I have also read this page, but I was not able to get a good understanding of what to do from that.
Can anyone help me or give me a step by step way to solve that?
I have a project to recognize the license plate and we can recognize and detect the numbers but I need to detect and recognize the words (it is the same words with more cars )
Your question is very broad, but I will do my best to explain optical character recognition (OCR) in a programmatic context and give you a general project workflow followed by successful OCR algorithms.
The problem you face is easier than most, because instead of having to recognize/differentiate between different characters, you only have to recognize a single image (assuming this is the only city you want to recognize). You are, however, subject to many of the limitations of any image recognition algorithm (quality, lighting, image variation).
Things you need to do:
1) Image isolation
You'll have to isolate your image from a noisy background:
I think that the best isolation technique would be to first isolate the license plate, and then isolate the specific characters you're looking for. Important things to keep in mind during this step:
Does the license plate always appear in the same place on the car?
Are cars always in the same position when the image is taken?
Is the word you are looking for always in the same spot on the license plate?
The difficulty/implementation of the task depends greatly on the answers to these three questions.
2) Image capture/preprocessing
This is a very important step for your particular implementation. Although possible, it is highly unlikely that your image will look like this:
as your camera would have to be directly in front of the license plate. More likely, your image may look like one of these:
depending on the perspective where the image is taken from. Ideally, all of your images will be taken from the same vantage point and you'll simply be able to apply a single transform so that they all look similar (or not apply one at all). If you have photos taken from different vantage points, you need to manipulate them or else you will be comparing two different images. Also, especially if you are taking images from only one vantage point and decide not to do a transform, make sure that the text your algorithm is looking for is transformed to be from the same point-of-view. If you don't, you'll have an not-so-great success rate that's difficult to debug/figure out.
3) Image optimization
You'll probably want to (a) convert your images to black-and-white and (b) reduce the noise of your images. These two processes are called binarization and despeckling, respectively. There are many implementations of these algorithms available in many different languages, most accessible by a Google search. You can batch process your images using any language /free tool if you want, or find an implementation that works with whatever language you decide to work in.
4) Pattern recognition
If you only want to search for the name of this one city (only one word ever), you'll most likely want to implement a matrix matching strategy. Many people also refer to matrix matching as pattern recognition so you may have heard it in this context before. Here is an excellent paper detailing an algorithmic implementation that should help you immensely should you choose to use matrix matching. The other algorithm available is feature extraction, which attempts to identify words based on patterns within letters (i.e. loops, curves, lines). You might use this if the font style of the word on the license plate ever changes, but if the same font will always be used, I think matrix matching will have the best results.
5) Algorithm training
Depending on the approach you take (if you use a learning algorithm), you may need to train your algorithm with data that is tagged. What this means is that you have a series of images that you've identified as True (contains city name) or False (does not). Here's a psuedocode example of how this works:
train = [(img1, True), (img2, True), (img3, False), (img4, False)]
img_recognizer = algorithm(train)
Then, you apply your trained algorithm to identify untagged images.
test_untagged = [img5, img6, img7]
for image in test_untagged:
img_recognizer(image)
Your training sets should be much larger than four data points; in general, the bigger the better. Just make sure, as I said before, that all the images are of an identical transformation.
Here is a very, very high-level code flow that may be helpful in implementing your algorithm:
img_in = capture_image()
cropped_img = isolate(img_in)
scaled_img = normalize_scale(cropped_img)
img_desp = despeckle(scaled_img)
img_final = binarize(img_desp)
#train
match() = train_match(training_set)
boolCity = match(img_final)
The processes above have been implemented many times and are thoroughly documented in many languages. Below are some implementations in the languages tagged in your question.
Pure Java
cvBlob in OpenCV (check out this tutorial and this blog post too)
tesseract-ocr in C++
Matlab OCR
Good luck!
If you ask "I want to detect if the image contains the word "بابل" - this is classic problem which is solved using http://code.opencv.org/projects/opencv/wiki/FaceDetection like classifier.
But I assume you still want more. Years ago I tried to solve simiar problems and I provide example image to show how good/bad it was:
To detected licence plate I used very basic rectangle detection which is included in every OpenCV samples folder. And then used perspective transform to fix layout and size. It was important to implement multiple checks to see if rectangle looks good enough to be licence plate. For example if rectangle is 500px tall and 2px wide, then probably this is not what I want and was rejected.
Use https://code.google.com/p/cvblob/ to extract arabic text and other components on detected plate. I just had similar need yesterday on other project. I had to extract Japanese kanji symbols from page:
CvBlob does a lot of work for you.
Next step use technique explained http://blog.damiles.com/2008/11/basic-ocr-in-opencv/ to match city name. Just teach algorithm with example images of different city names and soon it will tell 99% of them just out of box. I have used similar approaches on different projects and quite sure they work
I thought of writing a piece of software which does Alpha Compositing. I didn't wanted ready made code off from internet so I tried to find research papers and other sources to understand the mathematical algorithms, and initiated to implement.
But, I got lost very quickly. So my question is,
How should I approach these papers to extract the necessary details from it in order to write algorithm based on it. Any specific set of steps which works well?
Desired answer :
Read ...
Extract ...
Understand ...
Implement ...
Note: This question is not limited to only Alpha Compositing, so more generalised approach will be helpful. I have tagged Java and C++, because thats my desired language to implement the image processing.
What I have done so far?
This is not a homework question but it is of course better to say what I know. I have read wiki of Alpha compositing, and few closely related Image compositing research papers. But, I stuck at the next step to take in order to go from understanding to implementation.
Wikipedia
Technical Memo, Image compositing
I'd recommend reading articles with complex formulas with a pencil and paper. Work through the math involved until you have a good grasp on it. Then, you'll be ready to code.
Start with identifying the steps needed to perform your algorithm on some image data. Include all of the steps from loading the image itself into memory all the way through the complex calculations that you may need to perform. Then structure that list into pseudocode. Once you have that, it should be rather easy to code up.
Write pseudocode. Ideally, the authors of the research papers would have done this, but often they don't. Write pseudocode for some simple language like Matlab or possibly Python, and hack away at writing a working implementation based on the psuedocode.
If you understand some parts of the algorithm but not others, then implement your pseudocode into real code for the parts you understand, and leaving comments for the places you don't.
The section from The Pragmatic Programmer on "Tracer Bullets" basically describes this idea. You want to quickly hack together something that takes your data into some form of an output, and then iterate on the body of the code to get it to slowly resemble the algorithm you're trying to produce.
My answer is necessarily somewhat vague. There's no magic bullet for something like this.
Have you implemented any image processing algorithms? Maybe start with something a little simpler, like desaturation/color intensification, reversal (side to side and upside down), rotating, scaling, and compositing images through a mask.
Once you have those figured out, you will be in a very good position to do an alpha composite.
I agree that academic papers seem to go out of their way to make implementation details muddy and uncertain. I find that large amounts of simplification to what is written is needed to begin to perform a practical implementation. In their haste to be general, writers excessively parameterize every aspect. To build useful, reliable software, it is necessary to start with something simple which actually works so that it can be a framework to add features. To do that, it is necessary to throw away 80–90 percent of the academic generality. Often much can be done with a raft of symbolic constants, but abandoning generality (say for four and five dimensional images) doesn't really lose anything in practice.
My suggestion is to first write the algorithm using Matlab to make sure that you understood all the steps and then try to implement using C++ or java.
To add to the good suggestions above, try to write your pseudocode in simple module (Object oriented style ) so has to have a deep understanding of each part of your code while not loosing the big picture. Writing everything in a procedural way is good a the beginning but as the code grow, it might get become hard to keep up will all you are trying to do.
This example cites one of the seminal works on the topic: Compositing Digital Images by Porter & Duff. The class java.awt.AlphaComposite implements the same rules.
I have as an input a 2D polygon with holes, and I need to find it's straight skeleton, like in the picture:
(source: cgal.org)
Maybe there is a good Java library for it?
And if not, can you point me to the good explanation of the algorithm, so I could implement it myself? (I haven't found good resources on Google)
I wrote this a little while back. Not sure if it's robust enough.
https://github.com/twak/campskeleton
(edited for 2018...)
See http://www.sable.mcgill.ca/~dbelan2/roofs/roofs.html which contains an applet.
You may be able to use the JTS Topology Suite. It is a very capable library that I've used on a number of projects - never for straight skeleton, but it may be possible.
Edit:
Ah. I see that "Straight Skeleton" is a technical term. The wikipedia article references several algorithms. Have you looked at those?
As I understand it, you have a (convex?) polygon. From it, you subtract 1 or more (potentially non-convex) polygons. You want to turn the result into a set of polygons without holes. Are there extra rules that you're trying to apply?
I have a hard time coming up with a set of rules from the example that you provided. The outer polygons are non-convex; so it doesn't seem like you're trying to find a convex set to represent the result (which is a relatively common task).
If you could use the breakdown shown below, the algorithm is pretty simple. Can you clarify?
Can I ask u what is your purpose for finding Straight skeleton? Is it personal or commercial? I would be interested in knowing how you r using it to solve real time problems? I do have a java library that does that. My algorithm is listed here http://web.stcloudstate.edu/rsarnath/skeleton/definition.htm
I'm looking for several methods to compare two images to see how similar they are. Currently I plan to have percentages as the 'similarity index' end-result. My program outline is something like this:
User selects 2 images to compare.
With a button, the images are compared using several different methods.
At the end, each method will have a percentage next to it indicating how similar the images are based on that method.
I've done a lot of reading lately and some of the stuff I've read seems to be incredibly complex and advanced and not for someone like me with only about a year's worth of Java experience. So far I've read about:
The Fourier Transform - im finding this rather confusing to implement in Java, but apparently the Java Advanced Imaging API has a class for it. Though I'm not sure how to convert the output to an actual result
SIFT algorithm - seems incredibly complex
Histograms - probably the easiest out of all mentioned so far
Pixel grabbing - seems viable but if theres a considerable amount of variation between the two images it doesn't look like it's going to produce any sort of accurate result. I might be wrong?
I also have the idea of pre-processing an image using a Sobel filter first, then comparing it. Problem is the actual comparing part.
So yeah I'm looking to see if anyone has ideas for comparing images in Java. Hoping that there are people here that have done similar projects before. I just want to get some input on viable comparison techniques that arent too hard to implement in Java.
Thanks in advance
Fourier Transform - This can be used to efficiently can compute the cross-correlation, which will tell you how to align the two images and how similar they are, when they are optimally aligned.
Sift descriptors - These can be used to compare local features. They are often used for correspondence analysis and object recognition. (See also SURF)
Histograms - The normalized cross-correlation often yields good results for comparing images on a global level. But since you are just comparing color distributions you could end up declaring an outdoor scene with lots of snow as similar to an indoor scene with lots of white wallpaper...
Pixel grabbing - No idea what this is...
You can get a good overview from this paper. Another field you might to look into is content based image retrieval (CBIR).
Sorry for not being Java specific. HTH.
As a better alternative to simple pixel grabbing, try SSIM. It does require that your images are essentially of the same object from the same angle, however. It's useful if you're comparing images that have been compressed with different algorithms, for example (e.g. JPEG vs JPEG2000). Also, it's a fairly simple approach that you should be able to implement reasonably quickly to see some results.
I don't know of a Java implementation, but there's a C++ implementation using OpenCV. You could try to re-use that (through something like javacv) or just write it from scratch. The algorithm itself isn't that complicated anyway, so you should be able to implement it directly.