Im am trying to learn and also adapt the ImageNeuralNetwork example in Java. So far my problem is that when i give the NN a larger amount of images that are 32X32 and let it train the error never goes down below 14% and at the beginning it jumps all over the place.
My images are BW and they are classified into 27 classes. so i know that there are 27 output neurons.
My question is why is the NN not learning, I tried setting different hidden layers ( 1 or 2 layers) with different neuron counts but nothing helps.
Can anyone give me an idea of what im doing wrong? Like i said im just beginning with NNs and im a bit lost here
Edit: It seems if i give it less images as input to learn the error goes down, but that doesnt really solve the problem, If i wanted to classify a lot of images i would be stuck with the error never going down
You need to use only one hidden layer. Additional hidden layers on a neural network really does not give you much, see the universal approximation theorem. I would try starting with (input count + output count) * 1.5 as the number of hidden neurons.
As to why the ANN fails to coverage with more images, that is more difficult. Most likely it is because the additional images are too varied for the ANN to classify them all. A simple feedforward ANN is really not ideal for grid-based image recognition. The neural network does not know which pixels are next to each other, it is just a straight linear vector of pixels. The ANN is basically learning which pixels must be present for each of the letters. If you shift one of the letters even slightly, the ANN might not recognize it because you've now moved nearly every pixel that it was trained with.
I really do not do much with OCR. However, this does seem to be the area where deep learning excels. Convolutional neural networks are better able to handle pixels near each other and approximate. You might get better results with a deep learning application. More info here: http://dpkingma.com/sgvb_mnist_demo/demo.html
Related
I've been playing a bit with some image processing techniques to do HDR pictures and similar. I find it very hard to align pictures taken in bursts... I tried some naïve motion search algorithms, simply based on comparing small samples of pixels (like 16x16) between different pictures that pretty much work like:
- select one 16x6 block in the first picture, one with high contrast, then blur it, to reduce noise
- compare in a neighbouring radius (also blurred for noise)... (usually using averaged squared difference)
- select the most similar one.
I tried a few things to improve this, for example using these search alghorithms (https://en.wikipedia.org/wiki/Block-matching_algorithm) to speed it up. The results however are not good and when they are they are not robust. Also they keep being computationally very intensive (which precludes usage on a mobile device for example).
I looked into popular research based algorithms like https://en.wikipedia.org/wiki/Lucas%E2%80%93Kanade_method, but it does not seem very suitable to big movements. If we see burst images taken with todays phones, that have sensors > 12Mpix, it's easy that small movements result in a difference of 50-100 pixels. The Lucas Kanade method seems more suitable for small amounts of motion...
It's a bit frustrating as there seem to be hundreds of apps that do HDR, and they seems to be able to match pictures so easily and reliably in a snap... I've tried to look into OpenCV, but all it offers seems to be the above Lucas Kanade method. Also I've seen projects like https://github.com/almalence/OpenCamera, which do this in pure java easily. Although the code is not easy (one class has 5k lines doing it all). Does anyone have any pointers to reliable resources.
Take a look at HDR+ paper by google. It uses a hierarchical algorithm for alignment that is very fast but not robust enough. Afterward it uses a merging algorithm that is robust to alignment failures.
But it may be a little tricky to use it for normal HDR, since it says:
We capture frames of constant exposure, which makes alignment more robust.
Here is another work that needs sub-pixel accurate alignment. It uses a refined version of the alignment introduced in the HDR+ paper.
HDR+ code
This is my first question on Stackoverflow, so at first - hi everyone :)
I'm a newbie in image processing, but I have to write an app (in Java) to detect changes between images from a camera (or rather to detect new objects on images).
A camera is taking a picture every minute, all day, so as an input I have a sequence of color images in JPG.
The important things are:
the camera isn't moving, so a background doesn't change. I'm interested only in
detecting objects (e.g. people, animals, cars, ..) between lens and a background
it should be impervious to image noise from a camera or weather (e.g. rain, snow, sun moving - shades)
the only thing I need as an output is an information that sth has changed between two images
I'm interested in as simple solution as possible, but it has to be a working solution.
It doesn't need to be infallible, but it should work correctly in most normal cases.
Of course, I don't expect someone give me a ready to use snippet of a code
(although that would be great! ;) ), but if someone, who knows the topic, gives me some guidelines (steps to do, algorithms or articles to read), I'll be really gratefull. I haven't found nothing appropriate on google and unfortunately I don't have a year to read few books and do a PhD to find a solution :)
you can parse md5 of the image and compare parts of it, and check if they are similar or not, you can refer to this
You can use Keypoint Matching which is almost the same method as 1 you can read about this.
Read about Histogram method
As a simple solution, just subtract one image from the other and look at the differences. Ignore small changes and try to build area of movement and just accept bigger areas.
I'm trying to find if a scanned pdf form contains a signature (like making sure a check is signed).
The problem domain:
I will be receiving document packages (multi page pdf's with multiple forms). I have already put together document package classifiers that will check the package for all documents and scale the images to a common size. After that I know where the signatures should be and can scan the area of the document specifically. What I'm looking for is the best approach to making sure there is a signature present. I've considered just checking for a base threshold of dark pixels but that seems so clumsy. The trouble with signatures is that they are not really writing, more of a personal mark.
The only thing I can come up with is a machine learning method to look for loopyness? But I'm not all the familiar with machine learning and don't even know where to start with something like that. Anyone with some suggestions for practical approaches would very appreciated.
I'm coding this in Java if that's helpful at all
What you asked was very broad so there isn't a lot of information that we can give you. However, I can point you to some helpful links:
http://java-ml.sourceforge.net/ --This is a library that you can download that has lots of useful algorithms and other code to include in your program
https://www.youtube.com/playlist?list=PLiaHhY2iBX9hdHaRr6b7XevZtgZRa1PoU --this is a series that explains neural networks (something you might want to look into for your machine learning)
So a big tip I have for your algorithm is to instead of looking for how long exactly all of the loops and things are, look at all of their relative distances
"Relative distances from what?" you say. Well this is where the next tip comes in handy: instead of keeping track of the lines, keep track of the tips of the loops and the order of these points. If you then take the distance between all of them (relatively of course which means to set one of the lengths to zero). Along to keeping track of the distances, you should also keep track of the angles. You would calculate the angle ABC by taking the distance between (A,B), (B,C), and (A,C) (A,B, and C being coordinates on the xy plane) which creates a triangle between the points which allows you to use trigonometry to calculate the angle.
(I am assuming that for all of these you are also trying to detect who's signature it is of course because it actually doesn't really complicate things much at all) When trying to match up the signature detected to the stored signatures to see if they are the "same," don't make it to where the distances and angles have to be exact. Give a margin of error (like use a % range above and below). Here is a tip: Make the margin of error rather large. That way if it is written poorly, it will still be detected. This raises the chances of more than one signature being picked up. Luckily, there is a simply solution to this. Just have it run the algorithm again on the signatures that were found but with the margin of error smaller (you of course don't do this manually, the program does it). Continue decreasing the margin of error until you get only one signature remaining.
I am hoping you have ideas already for detecting where the actual signature is but check for the difference in darkness of the pixels of course. Make sure it is pretty continuous. Also take note of the fact that signatures are commonly signed in both black or blue or sometimes red and other fancy colors.
I've recently begun trying to create a mobile app (iOS/Android) that will automatically beat match (http://en.wikipedia.org/wiki/Beatmatching) two songs.
I know that this exists out there, and there have been others who have had some success, but I'm running into issues related to the accuracy of the players.
Specifically, I run into "sync" issues where the "beats" don't line up. The various methods used to date are:
Calculate the BPM in advance, identify a "beat" (using something like sonicapi.com), and trying to line up appropriately, and begin a mix in with its playback rate adjusted (tempo adjustment)
Utilizing a bunch of meta data to trigger specific starts and stops
What does NOT work:
Leveraging echonest's API (it beat matches on the server, we want to do it on the client)
Something like pydub (does not do it in realtime)
Who uses this algorithm today:
iwebdj
Traktor
Does anyone have any suggestions on how to solve this problem? I've seen lots of people do it, but doing it in real time on a mobile device seems to be an issue.
There are lots of methods for solving this problem, some of which work better than others. Matthew Davies has published several papers on the matter, among many others. Glancing at this article seems to break down some of the steps necessary for doing this. I built a beat tracker in Matlab (unfortunately...) with a fellow student and our goal was to create an outro/intro between 2 songs so that the tempo was seamless between them. We wanted to do this for songs that varied in BPM by a small amount (+-7 or so BPM between the two). Our method went sort of like this:
Find two songs in our database that had overlapping 'key center'. So lets say 2 songs, both in Am.
Find this particular overlap of key centers between the two. Say 30 seconds into song 1 and 60 seconds into song 2
Now create a beat map, using an onset-detection algorithm with peak picking; Also, this was helpful for us.
Pick the first 'beat' for each track, and overlap the two tracks at that point. Now, since they are slightly different BPM from each other, the beats won't really line up with each other.
From this, we created a sort of map that gave us the sample offsets between beats of song A and beats of song B. From this, we wanted to be able to time-stretch the fade-in region of song B so that each one of its onsets (beats in this case) lined up at the correct sample index as the onsets from song A, over ITS fade-out region. So for example, if onset 2 from song B was shown as 5,000 samples ahead of onset 2 from song A, we simply stretched that 5,000 sample region so that onset 2 matched exactly between both songs.
This seems like it would sound weird, but it actually sounded pretty good. Although this was done entirely offline in Matlab, I am also looking for a way to do this in real-time in a mobile app. Not entirely sure about libraries you can use for this in Android world, but I imagine that it would be most efficient in C++.
A couple of libraries I have come across would be good for prototyping something, or at least studying the source code to get a better understanding of how you could do this in a mobile app:
Essentia (great community, open-source)
Aubio (also seems to be maintained pretty well, open-source)
Additional things to read up on for doing this kind of stuff in iOS land:
vDSP Programming guide
This article may also help
I came across this project that is doing some beat detection. Although it seems pretty out-dated unfortunately, it may offer some additional insights.
Unfortunately it isn't as simple as just 'pressing play' at the same time to align beats, unless you are assuming very specific aspects about them (exact tempos, etc.).
If you reallllly have some time on your hands, you should check out Tristan Jehan's (founder of Echonest) thesis; it is jam packed with algorithms and methods for beat detection, etc.
As I have mentioned in previous questions I am writing a maze solving application to help me learn about more theoretical CS subjects, after some trouble I've got a Genetic Algorithm working that can evolve a set of rules (handled by boolean values) in order to find a good solution through a maze.
That being said, the GA alone is okay, but I'd like to beef it up with a Neural Network, even though I have no real working knowledge of Neural Networks (no formal theoretical CS education). After doing a bit of reading on the subject I found that a Neural Network could be used to train a genome in order to improve results. Let's say I have a genome (group of genes), such as
1 0 0 1 0 1 0 1 0 1 1 1 0 0...
How could I use a Neural Network (I'm assuming MLP?) to train and improve my genome?
In addition to this as I know nothing about Neural Networks I've been looking into implementing some form of Reinforcement Learning, using my maze matrix (2 dimensional array), although I'm a bit stuck on what the following algorithm wants from me:
(from http://people.revoledu.com/kardi/tutorial/ReinforcementLearning/Q-Learning-Algorithm.htm)
1. Set parameter , and environment reward matrix R
2. Initialize matrix Q as zero matrix
3. For each episode:
* Select random initial state
* Do while not reach goal state
o Select one among all possible actions for the current state
o Using this possible action, consider to go to the next state
o Get maximum Q value of this next state based on all possible actions
o Compute
o Set the next state as the current state
End Do
End For
The big problem for me is implementing a reward matrix R and what a Q matrix exactly is, and getting the Q value. I use a multi-dimensional array for my maze and enum states for every move. How would this be used in a Q-Learning algorithm?
If someone could help out by explaining what I would need to do to implement the following, preferably in Java although C# would be nice too, possibly with some source code examples it'd be appreciated.
As noted in some comments, your question indeed involves a large set of background knowledge and topics that hardly can be eloquently covered on stackoverflow. However, what we can try here is suggest approaches to get around your problem.
First of all: what does your GA do? I see a set of binary values; what are they? I see them as either:
bad: a sequence of 'turn right' and 'turn left' instructions. Why is this bad? Because you're basically doing a random, brute-force attempt at solving your problem. You're not evolving a genotype: you're refining random guesses.
better: every gene (location in the genome) represents a feature that will be expressed in the phenotype. There should not be a 1-to-1 mapping between genome and phenotype!
Let me give you an example: in our brain there are 10^13ish neurons. But we have only around 10^9 genes (yes, it's not an exact value, bear with me for a second). What does this tell us? That our genotype does not encode every neuron. Our genome encodes the proteins that will then go and make the components of our body.
Hence, evolution works on the genotype directly by selecting features of the phenotype. If I were to have 6 fingers on each hand and if that would made me a better programmer, making me have more kids because I'm more successful in life, well, my genotype would then be selected by evolution because it contains the capability to give me a more fit body (yes, there is a pun there, given the average geekiness-to-reproducibily ratio of most people around here).
Now, think about your GA: what is that you are trying to accomplish? Are you sure that evolving rules would help? In other words -- how would you perform in a maze? What is the most successful thing that can help you: having a different body, or having a memory of the right path to get out? Perhaps you might want to reconsider your genotype and have it encode memorization abilities. Maybe encode in the genotype how much data can be stored, and how fast can your agents access it -- then measure fitness in terms of how fast they get out of the maze.
Another (weaker) approach could be to encode the rules that your agent uses to decide where to go. The take-home message is, encode features that, once expressed, can be selected by fitness.
Now, to the neural network issue. One thing to remember is that NNs are filters. They receive an input. perform operations on it, and return an output. What is this output? Maybe you just need to discriminate a true/false condition; for example, once you feed a maze map to a NN, it can tell you if you can get out from the maze or not. How would you do such a thing? You will need to encode the data properly.
This is the key point about NNs: your input data must be encoded properly. Usually people normalize it, maybe scale it, perhaps you can apply a sigma function to it to avoid values that are too large or too small; those are details that deal with error measures and performance. What you need to understand now is what a NN is, and what you cannot use it for.
To your problem now. You mentioned you want to use NNs as well: what about,
using a neural network to guide the agent, and
using a genetic algorithm to evolve the neural network parameters?
Rephrased like so:
let's suppose you have a robot: your NN is controlling the left and right wheel, and as input it receives the distance of the next wall and how much it has traveled so far (it's just an example)
you start by generating a random genotype
make the genotype into a phenotype: the first gene is the network sensitivity; the second gene encodes the learning ratio; the third gene.. so on and so forth
now that you have a neural network, run the simulation
see how it performs
generate a second random genotype, evolve second NN
see how this second individual performs
get the best individual, then either mutate its genotype or recombinate it with the loser
repeat
there is an excellent reading on the matter here: Inman Harvey Microbial GA.
I hope I did you some insight on such issues. NNs and GA are no silver bullet to solve all problems. In some they can do very much, in others they are just the wrong tool. It's (still!) up to us to get the best one, and to do so we must understand them well.
Have fun in it! It's great to know such things, makes everyday life a bit more entertaining :)
There is probably no 'maze gene' to find,
genetic algorithms are trying to setup a vector of properties and a 'filtering system' to decide by some kind of 'surival of the fittest' algorithm to find out which set of properties would do the best job.
The easiest way to find a way out of a maze is to move always left (or right) along a wall.
The Q-Algorithm seems to have a problem with local maxima this was workaround as I remember by kicking (adding random values to the matrix) if the results didn't improve.
EDIT: As mentioned above a backtracking algorithm suits this task better than GA or NN.
How to combine both algorithm is described here NeuroGen descibes how GA is used for training a NN.
Try using the free open source NerounDotNet C# library for your Neural networks instead of implementing it.
For Reinforcement Learning library, I am currently looking for one, especially for Dot NET framework..