This is my first question on Stackoverflow, so at first - hi everyone :)
I'm a newbie in image processing, but I have to write an app (in Java) to detect changes between images from a camera (or rather to detect new objects on images).
A camera is taking a picture every minute, all day, so as an input I have a sequence of color images in JPG.
The important things are:
the camera isn't moving, so a background doesn't change. I'm interested only in
detecting objects (e.g. people, animals, cars, ..) between lens and a background
it should be impervious to image noise from a camera or weather (e.g. rain, snow, sun moving - shades)
the only thing I need as an output is an information that sth has changed between two images
I'm interested in as simple solution as possible, but it has to be a working solution.
It doesn't need to be infallible, but it should work correctly in most normal cases.
Of course, I don't expect someone give me a ready to use snippet of a code
(although that would be great! ;) ), but if someone, who knows the topic, gives me some guidelines (steps to do, algorithms or articles to read), I'll be really gratefull. I haven't found nothing appropriate on google and unfortunately I don't have a year to read few books and do a PhD to find a solution :)
you can parse md5 of the image and compare parts of it, and check if they are similar or not, you can refer to this
You can use Keypoint Matching which is almost the same method as 1 you can read about this.
Read about Histogram method
As a simple solution, just subtract one image from the other and look at the differences. Ignore small changes and try to build area of movement and just accept bigger areas.
Related
What is the best method to track/recognize an object using a Kinect and Java or C programming after having a constant track on the object in 3D space I wanted to have the coordinates.
I know the exact object I wanna tack and wanted to the most convenient way to track the object.
I've currently programming with processing using Java, I'm a newbie to this any help would be appreciated.
Stack Overflow isn't really designed for general "how do I do this" type questions. It's designed for more specific "I tried X, expected Y, but got Z instead" type questions. That being said, I'll try to help in a general sense:
Break your problem down into smaller pieces.
Step 1: Can you get Kinect data feeding into your code? Don't worry about doing anything with the data, just display it on the screen for now. Googling something like "Processing Kinect" returns a ton of results, or you could check out the Processing libraries page.
Step 2: After you get that working, then can you identify your target point? Then can you track that point? Again, google is your friend. You might also consider treating this as a separate problem and using something like OpenCV to do image processing on the Kinect feed.
Open Kinect by Daniel Shiffman is a pretty good starting point, and it contains a bunch of examples that get you closer to your goal.
That should be a reasonable starting point: break your problem down into smaller steps, then use google searches to approach those steps one at a time. If you get stuck on a specific step, come back and ask a specific question (don't forget the MCVE) and we'll go from there. Good luck.
I would like to do some audio and video analysis in Java.
In a bit more detail, I would like to identify the points in audio/video that have either been monotonous for quite some time or have drastically changed compared to some previous state.
If you want to look at it in a mathematical way, I can try to explain it like this:
Example:
You have an audio file. You should extract the waveform of that
audio file. You could try to approximate that waveform with some
simpler function, that can be expressed as a closed formula. Let's
call that function f(t).
Now, to find out how your function behaves (is it increasing or decreasing) at some point or interval, I guess I could use the first derivative,f'(t). If I'd like even more information, I assume second derivative, f''(t) would also come in handy.
So, if we assume we can do that then I guess I'd have 1 piece of information about the audio.
However, if I'm not mistaken, audio files can also have spectrograms, so I'm unsure how they fall into all of this.
So, the real question goes here: Is there a way to do this in Java (efficiently)? I've been doing some digging and I've found MusicG, however, the last update date is July 2012, which leads me to believe this may be abandoned.
The second part refers to video files, but without their audio component.
This is where I'll have more questions, so I'm just gonna go and shoot them.
How do you identify points of change in "pace" in videos?
Here's an example:
Imagine the video shows car driver's point of view while he's driving
on a long, straight road. Since the surroundings are mostly the same,
the pace could be described as "not changing much". At one point, the
road begins to curve but the driver, due to him falling asleep" is not
following the road that precisely, so the surroundings start to change
somewhat, and so does the pace. At the apex of that curve there is a
tree, which grows bigger and bigger as the car is approaching it.
Here, the POV (and the pace) is changing quite a lot, since the tree
is getting bigger and bigger. In the end, the car crashes into a tree,
all hell breaks loose, the car starts to roll uncontrollably, which
indicates a really intense pace.
I'm assuming one way could be to do an image segmentation and somehow determine which portions of the frames are changing, and how big are those portions to try to determine pace, but I'd like additional input.
If anyone has had prior experience doing any sort of related work in Java, what approaches did you explore and/or use? One thing that immediately comes to my mind is JavaCV, but as I said, with my limited experience, I'm unsure what to actually try.
Im am trying to learn and also adapt the ImageNeuralNetwork example in Java. So far my problem is that when i give the NN a larger amount of images that are 32X32 and let it train the error never goes down below 14% and at the beginning it jumps all over the place.
My images are BW and they are classified into 27 classes. so i know that there are 27 output neurons.
My question is why is the NN not learning, I tried setting different hidden layers ( 1 or 2 layers) with different neuron counts but nothing helps.
Can anyone give me an idea of what im doing wrong? Like i said im just beginning with NNs and im a bit lost here
Edit: It seems if i give it less images as input to learn the error goes down, but that doesnt really solve the problem, If i wanted to classify a lot of images i would be stuck with the error never going down
You need to use only one hidden layer. Additional hidden layers on a neural network really does not give you much, see the universal approximation theorem. I would try starting with (input count + output count) * 1.5 as the number of hidden neurons.
As to why the ANN fails to coverage with more images, that is more difficult. Most likely it is because the additional images are too varied for the ANN to classify them all. A simple feedforward ANN is really not ideal for grid-based image recognition. The neural network does not know which pixels are next to each other, it is just a straight linear vector of pixels. The ANN is basically learning which pixels must be present for each of the letters. If you shift one of the letters even slightly, the ANN might not recognize it because you've now moved nearly every pixel that it was trained with.
I really do not do much with OCR. However, this does seem to be the area where deep learning excels. Convolutional neural networks are better able to handle pixels near each other and approximate. You might get better results with a deep learning application. More info here: http://dpkingma.com/sgvb_mnist_demo/demo.html
I want to implement object detection in license plate (the city name) . I have an image:
and I want to detect if the image contains the word "بابل":
I have tried using a template matching method using OpenCV and also using MATLAB but the result is poor when tested with other images.
I have also read this page, but I was not able to get a good understanding of what to do from that.
Can anyone help me or give me a step by step way to solve that?
I have a project to recognize the license plate and we can recognize and detect the numbers but I need to detect and recognize the words (it is the same words with more cars )
Your question is very broad, but I will do my best to explain optical character recognition (OCR) in a programmatic context and give you a general project workflow followed by successful OCR algorithms.
The problem you face is easier than most, because instead of having to recognize/differentiate between different characters, you only have to recognize a single image (assuming this is the only city you want to recognize). You are, however, subject to many of the limitations of any image recognition algorithm (quality, lighting, image variation).
Things you need to do:
1) Image isolation
You'll have to isolate your image from a noisy background:
I think that the best isolation technique would be to first isolate the license plate, and then isolate the specific characters you're looking for. Important things to keep in mind during this step:
Does the license plate always appear in the same place on the car?
Are cars always in the same position when the image is taken?
Is the word you are looking for always in the same spot on the license plate?
The difficulty/implementation of the task depends greatly on the answers to these three questions.
2) Image capture/preprocessing
This is a very important step for your particular implementation. Although possible, it is highly unlikely that your image will look like this:
as your camera would have to be directly in front of the license plate. More likely, your image may look like one of these:
depending on the perspective where the image is taken from. Ideally, all of your images will be taken from the same vantage point and you'll simply be able to apply a single transform so that they all look similar (or not apply one at all). If you have photos taken from different vantage points, you need to manipulate them or else you will be comparing two different images. Also, especially if you are taking images from only one vantage point and decide not to do a transform, make sure that the text your algorithm is looking for is transformed to be from the same point-of-view. If you don't, you'll have an not-so-great success rate that's difficult to debug/figure out.
3) Image optimization
You'll probably want to (a) convert your images to black-and-white and (b) reduce the noise of your images. These two processes are called binarization and despeckling, respectively. There are many implementations of these algorithms available in many different languages, most accessible by a Google search. You can batch process your images using any language /free tool if you want, or find an implementation that works with whatever language you decide to work in.
4) Pattern recognition
If you only want to search for the name of this one city (only one word ever), you'll most likely want to implement a matrix matching strategy. Many people also refer to matrix matching as pattern recognition so you may have heard it in this context before. Here is an excellent paper detailing an algorithmic implementation that should help you immensely should you choose to use matrix matching. The other algorithm available is feature extraction, which attempts to identify words based on patterns within letters (i.e. loops, curves, lines). You might use this if the font style of the word on the license plate ever changes, but if the same font will always be used, I think matrix matching will have the best results.
5) Algorithm training
Depending on the approach you take (if you use a learning algorithm), you may need to train your algorithm with data that is tagged. What this means is that you have a series of images that you've identified as True (contains city name) or False (does not). Here's a psuedocode example of how this works:
train = [(img1, True), (img2, True), (img3, False), (img4, False)]
img_recognizer = algorithm(train)
Then, you apply your trained algorithm to identify untagged images.
test_untagged = [img5, img6, img7]
for image in test_untagged:
img_recognizer(image)
Your training sets should be much larger than four data points; in general, the bigger the better. Just make sure, as I said before, that all the images are of an identical transformation.
Here is a very, very high-level code flow that may be helpful in implementing your algorithm:
img_in = capture_image()
cropped_img = isolate(img_in)
scaled_img = normalize_scale(cropped_img)
img_desp = despeckle(scaled_img)
img_final = binarize(img_desp)
#train
match() = train_match(training_set)
boolCity = match(img_final)
The processes above have been implemented many times and are thoroughly documented in many languages. Below are some implementations in the languages tagged in your question.
Pure Java
cvBlob in OpenCV (check out this tutorial and this blog post too)
tesseract-ocr in C++
Matlab OCR
Good luck!
If you ask "I want to detect if the image contains the word "بابل" - this is classic problem which is solved using http://code.opencv.org/projects/opencv/wiki/FaceDetection like classifier.
But I assume you still want more. Years ago I tried to solve simiar problems and I provide example image to show how good/bad it was:
To detected licence plate I used very basic rectangle detection which is included in every OpenCV samples folder. And then used perspective transform to fix layout and size. It was important to implement multiple checks to see if rectangle looks good enough to be licence plate. For example if rectangle is 500px tall and 2px wide, then probably this is not what I want and was rejected.
Use https://code.google.com/p/cvblob/ to extract arabic text and other components on detected plate. I just had similar need yesterday on other project. I had to extract Japanese kanji symbols from page:
CvBlob does a lot of work for you.
Next step use technique explained http://blog.damiles.com/2008/11/basic-ocr-in-opencv/ to match city name. Just teach algorithm with example images of different city names and soon it will tell 99% of them just out of box. I have used similar approaches on different projects and quite sure they work
We're a team of a programmer and a designer and we want to make a medium-sized java game which will be played as an applet in the web browser. Me (the programmer) has 3 years of general development experience, but I haven't done any game programming before.
We're assuming that:
We'll decide on a plot, storyline of the game, etc.
We'll create a list of assets (images) that we need, i.e player images, monster images, towns, buildings, trees, objects, etc. (We're not adding any music/sound efffects for now)
The designer will get started on creating those images while I finish reading some of the game programming books i've bought. The designer will create the first town/level of the game, then pass on those images to me, I will begin coding that first level and he would start on the next level, and after 4-5 levels we'll release v.1 of the game.
Question 1: Is this the correct methodology to use for this project?
Question 2: What format should the designer create those images in. Should they be .bmp, .jpeg, or .gif files? And, would he put all those images in one file, or put each monster/object/building in its own file? Note; We are sticking to 2D for now and not doing 3D.
Question 3: I've seen some game artware where there would be a file for a monster, and in that file there'd be about 3-4 images of a monster from different directions, all put in one file, i think because they're part of an animation. Here's an illustraton:
[Monster looking to right] ... [Monster looking in the front] ... [Monster looking to right[
And all of them are in one file. Is this how he'll have to supply me with those animations?
What i'm trying to find out is, what is the format he'll have to supply me the designed images in, for me to be able to access/manipulate them easily in the Java code.
All answers appreciated :)
I have some comments for each question.
Question 1: You say that you will begin coding level 1, 2, .. one by one. I recommend you to create a reusable framework instead or see it in the big picture instead. For the information you provide I think you are going to make some kind of RPG game. There are lots of things that can be shared between levels such as the Shop, the dialog system, for example. So focus for extensibility.
Why wait for designers to pass on the image? You can begin your coding by starting with pseudo graphics file you created yourself. You can then work with designer in parallel this way. And you can replace your pseudo graphics file with ones provided by designer later.
Question 2: JPG is not suitable for pixel-art style image, that appears a lot in most 2D game. And the GIF support only 256 color. The best choice to me seems to be PNG.
The designer should always keep the original artworks in editable format as well. It's likely that you want to change the graphics in the future.
Question 3: It depends. The format mentioned, where character's animations are kept in single file, is called Sprite. If you kept your resource in this sprite format than you will have some works reading each of the sub-image by specifying coordinates. However, sprite helps you keep things organized. All the 2D graphics related to "Zombie" character is kept in one place. It is therefore easy to maintain.
About the image format: don't let the designer deliver anything as jpg, because you'll lose quality.
Let him send it as png instead, and convert it to your preferred format as needed.
Also, remember to have him send the source files (photoshop/illustrator/3dsmax/whatever) in case you'll ever need tiny changes that you can make yourself without hiring the graphics dude. Who knows if he'll still be available in the future anyway.
I want to suggest to you that, before you make any decisions about your workflows, you and your colleague go have a look at JavaFX and see if maybe that's the toolkit that best meets your needs.
http://java.sun.com/javafx/
The [Monster looking to right] ... [Monster looking in the front] ... [Monster looking to left] style of animation demarcation has been around for as long as I've been peeking into game data, so I would suggest going with that path.
I was about to make the same remark as Wouter: use PNG, modern format which is highly compressed (as opposed to BMP), lossless (as opposed to Jpeg) and full color and with several level of transparency (as opposed to Gif).
Why people put several sprites in the same image? Actually, for Java, I am not sure, if the images are part of a jar... I know it is interesting in CSS, for example, because it reduces the number of images to download, so the number of hits on the server, which is a well known Web optimization. For games on hard disk, reducing the number of small files can be interesting too.
The designer can appreciate this too. At least in times where sprites used a color palette: you had only one image, using the same palette: easier to edit, and slightly reduce the overall size (in times were memory was costly!).
I can't answer on the methodology, I never did a game in team... If it fits your needs, it is probably the right methodology...
duncan points to JavaFX, I will point to pulpcore which seems to be a promising library. Of course, there are plenty others, like JGame and such.
Bunch of pros here: http://www.javagaming.org/
This is not answering any of the questions. But for game develop/Simulation Engines learning if u need a reference:
http://www.cs.chalmers.se/idc/ituniv/kurser/08/simul/
It's a link for the class lectures of Simulation Engines at Chalmers Univ in Gotembourg. The teacher as a game company and gave quite good lectures. Check the slides we had in the classes, maybe they'll help you a bit.