First question here so sorry if anything I ask is completely stupid.
I'm working in a shape recognition project where it is supposed for me to develop an application that receives two images: an original one and a sketch made by a user. I am supposed to detect contours of the two images and find the best match in the original image corresponding to the sketch made by the user.
I am already learning some basics about the Canny edge detection and was able to get the contours of several images. After having the contours, I need to analyze all contours in the image and find the best match, disregarding translation, rotation, scaling and occlusion.
Then, I found this code that does exactly what I want:
http://www.morethantechnical.com/2012/12/27/2d-curve-matching-in-opencv-w-code/ but is in C++.
Do you know any alternative for similar code in Java or any algorithm that could be useful to me? I also discovered BoofCV but it seems that such task is not implemented.
Thank you for your patience.
EDIT:
I've been searching for other ways of doing this, and I found the Hausdorff distance:
http://cgm.cs.mcgill.ca/~godfried/teaching/cg-projects/98/normand/main.html
Is it possible to modify this algorithm to be rotation invariant? They only talk about translation and scaling.
As you mentioned you already have the source code in C++ and can't find a Java version - possibly your best bet might be to convert the C++ code into Java code. If you don't need all parts of the C++ program, you might want to covert only the parts (classes) that you need.
Conversion from C++ to Java might not be always trivial but I am guessing it might be easier if you know exactly what you want and how you want your program to behave. Below is a link to some conversion tools - although they might not be free.
http://www.researchgate.net/post/How_to_convert_the_C_C_code_to_java2
I want to implement object detection in license plate (the city name) . I have an image:
and I want to detect if the image contains the word "بابل":
I have tried using a template matching method using OpenCV and also using MATLAB but the result is poor when tested with other images.
I have also read this page, but I was not able to get a good understanding of what to do from that.
Can anyone help me or give me a step by step way to solve that?
I have a project to recognize the license plate and we can recognize and detect the numbers but I need to detect and recognize the words (it is the same words with more cars )
Your question is very broad, but I will do my best to explain optical character recognition (OCR) in a programmatic context and give you a general project workflow followed by successful OCR algorithms.
The problem you face is easier than most, because instead of having to recognize/differentiate between different characters, you only have to recognize a single image (assuming this is the only city you want to recognize). You are, however, subject to many of the limitations of any image recognition algorithm (quality, lighting, image variation).
Things you need to do:
1) Image isolation
You'll have to isolate your image from a noisy background:
I think that the best isolation technique would be to first isolate the license plate, and then isolate the specific characters you're looking for. Important things to keep in mind during this step:
Does the license plate always appear in the same place on the car?
Are cars always in the same position when the image is taken?
Is the word you are looking for always in the same spot on the license plate?
The difficulty/implementation of the task depends greatly on the answers to these three questions.
2) Image capture/preprocessing
This is a very important step for your particular implementation. Although possible, it is highly unlikely that your image will look like this:
as your camera would have to be directly in front of the license plate. More likely, your image may look like one of these:
depending on the perspective where the image is taken from. Ideally, all of your images will be taken from the same vantage point and you'll simply be able to apply a single transform so that they all look similar (or not apply one at all). If you have photos taken from different vantage points, you need to manipulate them or else you will be comparing two different images. Also, especially if you are taking images from only one vantage point and decide not to do a transform, make sure that the text your algorithm is looking for is transformed to be from the same point-of-view. If you don't, you'll have an not-so-great success rate that's difficult to debug/figure out.
3) Image optimization
You'll probably want to (a) convert your images to black-and-white and (b) reduce the noise of your images. These two processes are called binarization and despeckling, respectively. There are many implementations of these algorithms available in many different languages, most accessible by a Google search. You can batch process your images using any language /free tool if you want, or find an implementation that works with whatever language you decide to work in.
4) Pattern recognition
If you only want to search for the name of this one city (only one word ever), you'll most likely want to implement a matrix matching strategy. Many people also refer to matrix matching as pattern recognition so you may have heard it in this context before. Here is an excellent paper detailing an algorithmic implementation that should help you immensely should you choose to use matrix matching. The other algorithm available is feature extraction, which attempts to identify words based on patterns within letters (i.e. loops, curves, lines). You might use this if the font style of the word on the license plate ever changes, but if the same font will always be used, I think matrix matching will have the best results.
5) Algorithm training
Depending on the approach you take (if you use a learning algorithm), you may need to train your algorithm with data that is tagged. What this means is that you have a series of images that you've identified as True (contains city name) or False (does not). Here's a psuedocode example of how this works:
train = [(img1, True), (img2, True), (img3, False), (img4, False)]
img_recognizer = algorithm(train)
Then, you apply your trained algorithm to identify untagged images.
test_untagged = [img5, img6, img7]
for image in test_untagged:
img_recognizer(image)
Your training sets should be much larger than four data points; in general, the bigger the better. Just make sure, as I said before, that all the images are of an identical transformation.
Here is a very, very high-level code flow that may be helpful in implementing your algorithm:
img_in = capture_image()
cropped_img = isolate(img_in)
scaled_img = normalize_scale(cropped_img)
img_desp = despeckle(scaled_img)
img_final = binarize(img_desp)
#train
match() = train_match(training_set)
boolCity = match(img_final)
The processes above have been implemented many times and are thoroughly documented in many languages. Below are some implementations in the languages tagged in your question.
Pure Java
cvBlob in OpenCV (check out this tutorial and this blog post too)
tesseract-ocr in C++
Matlab OCR
Good luck!
If you ask "I want to detect if the image contains the word "بابل" - this is classic problem which is solved using http://code.opencv.org/projects/opencv/wiki/FaceDetection like classifier.
But I assume you still want more. Years ago I tried to solve simiar problems and I provide example image to show how good/bad it was:
To detected licence plate I used very basic rectangle detection which is included in every OpenCV samples folder. And then used perspective transform to fix layout and size. It was important to implement multiple checks to see if rectangle looks good enough to be licence plate. For example if rectangle is 500px tall and 2px wide, then probably this is not what I want and was rejected.
Use https://code.google.com/p/cvblob/ to extract arabic text and other components on detected plate. I just had similar need yesterday on other project. I had to extract Japanese kanji symbols from page:
CvBlob does a lot of work for you.
Next step use technique explained http://blog.damiles.com/2008/11/basic-ocr-in-opencv/ to match city name. Just teach algorithm with example images of different city names and soon it will tell 99% of them just out of box. I have used similar approaches on different projects and quite sure they work
I'm looking for several methods to compare two images to see how similar they are. Currently I plan to have percentages as the 'similarity index' end-result. My program outline is something like this:
User selects 2 images to compare.
With a button, the images are compared using several different methods.
At the end, each method will have a percentage next to it indicating how similar the images are based on that method.
I've done a lot of reading lately and some of the stuff I've read seems to be incredibly complex and advanced and not for someone like me with only about a year's worth of Java experience. So far I've read about:
The Fourier Transform - im finding this rather confusing to implement in Java, but apparently the Java Advanced Imaging API has a class for it. Though I'm not sure how to convert the output to an actual result
SIFT algorithm - seems incredibly complex
Histograms - probably the easiest out of all mentioned so far
Pixel grabbing - seems viable but if theres a considerable amount of variation between the two images it doesn't look like it's going to produce any sort of accurate result. I might be wrong?
I also have the idea of pre-processing an image using a Sobel filter first, then comparing it. Problem is the actual comparing part.
So yeah I'm looking to see if anyone has ideas for comparing images in Java. Hoping that there are people here that have done similar projects before. I just want to get some input on viable comparison techniques that arent too hard to implement in Java.
Thanks in advance
Fourier Transform - This can be used to efficiently can compute the cross-correlation, which will tell you how to align the two images and how similar they are, when they are optimally aligned.
Sift descriptors - These can be used to compare local features. They are often used for correspondence analysis and object recognition. (See also SURF)
Histograms - The normalized cross-correlation often yields good results for comparing images on a global level. But since you are just comparing color distributions you could end up declaring an outdoor scene with lots of snow as similar to an indoor scene with lots of white wallpaper...
Pixel grabbing - No idea what this is...
You can get a good overview from this paper. Another field you might to look into is content based image retrieval (CBIR).
Sorry for not being Java specific. HTH.
As a better alternative to simple pixel grabbing, try SSIM. It does require that your images are essentially of the same object from the same angle, however. It's useful if you're comparing images that have been compressed with different algorithms, for example (e.g. JPEG vs JPEG2000). Also, it's a fairly simple approach that you should be able to implement reasonably quickly to see some results.
I don't know of a Java implementation, but there's a C++ implementation using OpenCV. You could try to re-use that (through something like javacv) or just write it from scratch. The algorithm itself isn't that complicated anyway, so you should be able to implement it directly.
I am an undergraduate student. I was exposed to basic programming couple of years back in school. Till now I have an understanding of Core Java, Core Python and basic C and C++.
Every time I start off with some GUI programming so as I can start off with a project of mine, I get boggled by the sheer amount which is to be done, API to be learnt, MVC architecture and everything programmers talk about, event handling etc etc.
Studied awt and swings for a while. Tried my hands on Qt and Gtk, could not find much of documentation. Tried to make sense of pygame. I end up at the same place, knowing the core language.
Tkinter on my zenwalk Linux is broken so could never start it athough I own a book on python with Tkinter explained.
But I end up at the same place, with just the basic understanding of the language.
Want to start over, seriously now. I would like to choose python. How should I go about studying GUI programming?
I need some Internet resources and direction so that I don't end up at the same place!
Since it sounds like you want Python GUI programming, may I suggest PyGTK?
That's probably a pretty good place to start for someone who knows Python and would like to start small on some basic GUI apps. GTK can be complex at times, but with PyGTK there's plenty of open-source example apps you can study, from simple to complex.
Edit: This tutorial from LinuxJournal seems pretty helpful.
Edit 2: Here's the tutorial from PyGTK's site, and another tutorial I randomly found from Google (seems like that whole blog is pretty useful for what you want to do, actually). Finally, the snippet at the bottom of this page might be helpful, courtesy of Ubuntu's forums.
If you are leaning more to games...
I suggest you install Pygame and Python, and go through their tutorials. The pick a simple game or graphics project and program it!
For Python GUIs I like wxPython (www.wxpython.org). It is pretty easy to get started with simple controls and layouts. It is also cross platform. Plenty of tutorials out there. Just search for wxPython tutorial.
I know how you feel--I learned a whole lot of computer programming during my CS degree but very little about GUIs. I ended up teaching myself Cocoa/Objective-C for a project. Cocoa is wonderful for GUI stuff but often a royal pain with a steep learning curve. If you don't have any experience with C programming, don't bother.
First step: familiarize yourself with the MVC (Model/View/Controller) design convention, because nearly every GUI framework will reference it. Google it--there are lots of resources about it. My quick, simple definition is:
The model level defines the data or the logical model for the application. For a web app, that would be the database. For a game, it could be stored data and game logic/rules.
The view level is what the user sees and interacts with (the GUI).
The controller level is the logic that connects the two. For example, the controller knows that when you click the "start game" button in the view level, it does some stuff with the model (say, setting up the board and the players.)
Step two: Figure out what you want. Are you interested in desktop applications specifically? Games? Web apps?
If mostly what you want to do is to be able to develop something that people would actually use, another option is to learn a web development framework. The frameworks make stuff easy for you. I love Django, personally, and if you know a little Python and a little HTML and a little about MVC, you can pick it up quickly. (Just don't be confused, because what Django calls a view is actually a controller.)
If what you want to do is games or graphics/animation stuff, check out pygame. I used it for a class project--basically taught it to myself in a couple of weeks--and it worked great.
I'd say stay as far away as you can from Java Swing/awt/etc.
I've heard good things about wxPython--I almost ended up using it instead of Cocoa, because the wx stuff is available in several programming languages and it's all cross platform.
Good luck! Stay strong! I know it's really intimidating, because I've been in your shoes. You can do it with some work, practice, and motivation.
Many have recommended wxPython, and I second their enthusiasm - it is a great framework; it also includes a serious demo (with code and live applications) which will be extremely valuable for learning.
Now, BEWARE!
It is very simple to confuse the end with the means. Programming GUIs can be extremely attractive but not very productive. In my early days I spent days and days trying to get a simple plotting application (reinventing the wheel); a simple GUI for solving quadratic equations; a simple GUI for calling database queries by clicking on certain locations on a map, etc. During all this time I never actually dug into algorithms or more general and productive computer science and computer engineering topics. In retrospect, I should have. Granted, I did learn a lot and I don't totally regret it, but my advice stands: worry about your algorithm first and about your interface second. This may not apply to every field (I am an engineer for NASA). Nowadays I work with number crunching applications with no GUIs whatsoever; I don't think they need them!
Anyway, I just wanted to share my two cents with GUI programming - have fun but don't overdo it.
What do you mean by "Graphics"? Do you mean game graphics, or do you simply mean user interface code (forms, webpages, that sort of thing)? In the case of game graphics, there's a limit to how simple things can be made, but http://www.gamedev.net, for example, has tons of introductory articles on 2d and 3d engines. For something more along the application line, you might simply download Visual Studio or Eclipse and spend some time looking at the code that is autogenerated by their WYSIWYG editors.
For GUI work in general:
Less is more
GUI work (even in productive frameworks) is about as fun and productive as painting the Eiffel Tower with a toothbrush. Go for a minimal design.
Avoid State Like The Plague
Do you put state in your GUI, or in the model? If you put it in the GUI, you are going to mess yourself up with redundant and inconsistent code paths. If you put it in the model, you risk an overly complex system that gets out of sync when your GUI fails to update from the model. Both suck.
wxPython
If you want to learn wxPython, here are a few traps I noticed:
The tutorial
Use this tutorial - http://wiki.wxpython.org/AnotherTutorial
It's the best one I found.
But remember to toggle line numbers, for easy pasting.
Events
Events are a bit like exceptions, and they are used to make things interactive.
In a vanilla python program, you write something like:
def doit(i):
print 'Doing i = ',i
for i in range(10):
doit()
print 'Results = ',result
In a GUI, you do something like:
def doit(event):
print 'An event',event,'just happened!'
event.Skip()
import wx
app = wx.App()
frame = wx.Frame(None, -1, 'The title goes here')
frame.Bind(wx.EVT_KEY_DOWN, doit)
frame.Show()
app.MainLoop()
Every time the user presses a key down, an event will be raised. Since frame is bound to the event (frame.Bind(wx.EVT_KEY_DOWN, doit)), the function doit will be called with the event as an argument.
Printing to stderr isn't too hot in a gui, but doit could also call up a dialog, or do anything you want it to.
Also, you can generate your own events using timers.
Apps, Frames, Windows, Panels, and Sizers
Everything has a parent. If an event is raised, and the child doesn't skip it (using event.Skip()), then the parent will also have to handle the event. This is analogous to exceptions raising up to higher-level functions.
A wx.App is like the Main function.
wx.Window isn't really used. Stuff inherits from it, and it has all the methods for sizing and layout, but you don't need to know that.
wx.Frame is a floating frame, like the main window in Firefox. You will have main one frame in a basic application. If you want to edit multiple files then you might have more. A wx.Frame won't usually have parents.
wx.Panel is part of a parent window. You can have several panels inside a frame. A panel can have a wx.Frame as a parent, or it might be the child of another panel.
wx.Sizers are used to automatically layout panels inside frames (or other panels).
Code:
def doit1(event):
print 'event 1 happened'
def doit2(event):
print 'event 2 happened'
import wx
app = wx.App()
frame = wx.Frame(None, -1, 'The title goes here')
panel_1 = wx.Panel(frame,-1,style=wx.SIMPLE_BORDER)
panel_2 = wx.Panel(frame,-1,style=wx.SIMPLE_BORDER)
panel_1.Bind(wx.EVT_KEY_DOWN, doit1)
panel_2.Bind(wx.EVT_KEY_DOWN, doit2)
panel_1.SetBackgroundColour(wx.BLACK)
panel_2.SetBackgroundColour(wx.RED)
box = wx.BoxSizer(wx.HORIZONTAL)
box.Add(panel_1,1,wx.EXPAND)
box.Add(panel_2,1,wx.EXPAND)
frame.SetSizer(box)
frame.Show()
app.MainLoop()
I've been really bad, and not used OOP practices. Just remember that even if you hate OO in most contexts, GUI programming is the place where OOP really shines.
The MCV
I don't get MCV. I don' think you need an MCV. I think a MW (model-widget) framework is fine.
For example - 2 frames that edit the same piece of text:
class Model(object):
def __init__(self):
self.value = 'Enter a value'
self.listeners = []
def Add_listener(self,listener):
self.listeners.append(listener)
def Set(self,new_value):
self.value = new_value
for listener in self.listeners:
listener.Update(self.value)
import wx
app = wx.App()
class CVFrame(wx.Frame):
def __init__(self, parent, id, title, model):
wx.Frame.__init__(self, parent, id, title, size = (100,100))
self.button = wx.Button(self, -1, 'Set model value')
self.textctrl = wx.TextCtrl(self, -1,model.value)
self.button.Bind(wx.EVT_BUTTON,self.OnSet)
self.model = model
model.Add_listener(self)
sizer = wx.BoxSizer(wx.VERTICAL)
sizer.Add(self.button,0,wx.EXPAND)
sizer.Add(self.textctrl,1,wx.EXPAND)
self.SetSize((300,100))
self.SetSizer(sizer)
self.Center()
self.Show()
def OnSet(self,event):
self.model.Set(self.textctrl.GetValue())
def Update(self,value):
self.textctrl.SetValue(value)
model = Model()
frame1 = CVFrame(None, -1, 'Frame 1',model)
frame2 = CVFrame(None, -1, 'Frame 2',model)
app.MainLoop()
wxPython has a listener-subscriber framework, which is a better version of the model I just sketched out (it uses weak refs, so deleted listeners don't hang around, and so on), but that should help you get the idea.
If you have already gone through pygame, tk, Qt, and GTK, then really the only thing left that I can think of is pyglet, which I admit I have not tried, but I have read uniformly good things about it.
Still, more than anything it sounds as though you have trouble sticking with a framework long enough to really grok it. May I recommend starting with a small project, such as Pong or Breakout, and only learning as much as you need to make it? Once you have finished one thing, you will have a feel for the library, and continuing past there is a lot easier.
whatever language you choose you will have to deal with the many details involving GUI programing. this is due to the nature of the window based environment usually used for GUI.
what can help you move forward quickly in developing GUI based application is less the language and more the IDE you use. a good IDE can do some part of the less interesting stuff for you letting you focus on the big picture.
with C# in VS 2008 its all about choosing elements and methods from lists boxes. its very easy to get started and have a working project.
you can then try to customize your application to gain better understanding of whats going on behind the scenes
One of the greatest Python GUI you can study from is the source of IDLE. It always comes with Python.
For Java, you could also look into SWT.
While I have never used AWT or Swing, I have read that SWT is the easiest of the three to learn.
Here is a decent comparison between the three.
We're a team of a programmer and a designer and we want to make a medium-sized java game which will be played as an applet in the web browser. Me (the programmer) has 3 years of general development experience, but I haven't done any game programming before.
We're assuming that:
We'll decide on a plot, storyline of the game, etc.
We'll create a list of assets (images) that we need, i.e player images, monster images, towns, buildings, trees, objects, etc. (We're not adding any music/sound efffects for now)
The designer will get started on creating those images while I finish reading some of the game programming books i've bought. The designer will create the first town/level of the game, then pass on those images to me, I will begin coding that first level and he would start on the next level, and after 4-5 levels we'll release v.1 of the game.
Question 1: Is this the correct methodology to use for this project?
Question 2: What format should the designer create those images in. Should they be .bmp, .jpeg, or .gif files? And, would he put all those images in one file, or put each monster/object/building in its own file? Note; We are sticking to 2D for now and not doing 3D.
Question 3: I've seen some game artware where there would be a file for a monster, and in that file there'd be about 3-4 images of a monster from different directions, all put in one file, i think because they're part of an animation. Here's an illustraton:
[Monster looking to right] ... [Monster looking in the front] ... [Monster looking to right[
And all of them are in one file. Is this how he'll have to supply me with those animations?
What i'm trying to find out is, what is the format he'll have to supply me the designed images in, for me to be able to access/manipulate them easily in the Java code.
All answers appreciated :)
I have some comments for each question.
Question 1: You say that you will begin coding level 1, 2, .. one by one. I recommend you to create a reusable framework instead or see it in the big picture instead. For the information you provide I think you are going to make some kind of RPG game. There are lots of things that can be shared between levels such as the Shop, the dialog system, for example. So focus for extensibility.
Why wait for designers to pass on the image? You can begin your coding by starting with pseudo graphics file you created yourself. You can then work with designer in parallel this way. And you can replace your pseudo graphics file with ones provided by designer later.
Question 2: JPG is not suitable for pixel-art style image, that appears a lot in most 2D game. And the GIF support only 256 color. The best choice to me seems to be PNG.
The designer should always keep the original artworks in editable format as well. It's likely that you want to change the graphics in the future.
Question 3: It depends. The format mentioned, where character's animations are kept in single file, is called Sprite. If you kept your resource in this sprite format than you will have some works reading each of the sub-image by specifying coordinates. However, sprite helps you keep things organized. All the 2D graphics related to "Zombie" character is kept in one place. It is therefore easy to maintain.
About the image format: don't let the designer deliver anything as jpg, because you'll lose quality.
Let him send it as png instead, and convert it to your preferred format as needed.
Also, remember to have him send the source files (photoshop/illustrator/3dsmax/whatever) in case you'll ever need tiny changes that you can make yourself without hiring the graphics dude. Who knows if he'll still be available in the future anyway.
I want to suggest to you that, before you make any decisions about your workflows, you and your colleague go have a look at JavaFX and see if maybe that's the toolkit that best meets your needs.
http://java.sun.com/javafx/
The [Monster looking to right] ... [Monster looking in the front] ... [Monster looking to left] style of animation demarcation has been around for as long as I've been peeking into game data, so I would suggest going with that path.
I was about to make the same remark as Wouter: use PNG, modern format which is highly compressed (as opposed to BMP), lossless (as opposed to Jpeg) and full color and with several level of transparency (as opposed to Gif).
Why people put several sprites in the same image? Actually, for Java, I am not sure, if the images are part of a jar... I know it is interesting in CSS, for example, because it reduces the number of images to download, so the number of hits on the server, which is a well known Web optimization. For games on hard disk, reducing the number of small files can be interesting too.
The designer can appreciate this too. At least in times where sprites used a color palette: you had only one image, using the same palette: easier to edit, and slightly reduce the overall size (in times were memory was costly!).
I can't answer on the methodology, I never did a game in team... If it fits your needs, it is probably the right methodology...
duncan points to JavaFX, I will point to pulpcore which seems to be a promising library. Of course, there are plenty others, like JGame and such.
Bunch of pros here: http://www.javagaming.org/
This is not answering any of the questions. But for game develop/Simulation Engines learning if u need a reference:
http://www.cs.chalmers.se/idc/ituniv/kurser/08/simul/
It's a link for the class lectures of Simulation Engines at Chalmers Univ in Gotembourg. The teacher as a game company and gave quite good lectures. Check the slides we had in the classes, maybe they'll help you a bit.