I've been playing a bit with some image processing techniques to do HDR pictures and similar. I find it very hard to align pictures taken in bursts... I tried some naïve motion search algorithms, simply based on comparing small samples of pixels (like 16x16) between different pictures that pretty much work like:
- select one 16x6 block in the first picture, one with high contrast, then blur it, to reduce noise
- compare in a neighbouring radius (also blurred for noise)... (usually using averaged squared difference)
- select the most similar one.
I tried a few things to improve this, for example using these search alghorithms (https://en.wikipedia.org/wiki/Block-matching_algorithm) to speed it up. The results however are not good and when they are they are not robust. Also they keep being computationally very intensive (which precludes usage on a mobile device for example).
I looked into popular research based algorithms like https://en.wikipedia.org/wiki/Lucas%E2%80%93Kanade_method, but it does not seem very suitable to big movements. If we see burst images taken with todays phones, that have sensors > 12Mpix, it's easy that small movements result in a difference of 50-100 pixels. The Lucas Kanade method seems more suitable for small amounts of motion...
It's a bit frustrating as there seem to be hundreds of apps that do HDR, and they seems to be able to match pictures so easily and reliably in a snap... I've tried to look into OpenCV, but all it offers seems to be the above Lucas Kanade method. Also I've seen projects like https://github.com/almalence/OpenCamera, which do this in pure java easily. Although the code is not easy (one class has 5k lines doing it all). Does anyone have any pointers to reliable resources.
Take a look at HDR+ paper by google. It uses a hierarchical algorithm for alignment that is very fast but not robust enough. Afterward it uses a merging algorithm that is robust to alignment failures.
But it may be a little tricky to use it for normal HDR, since it says:
We capture frames of constant exposure, which makes alignment more robust.
Here is another work that needs sub-pixel accurate alignment. It uses a refined version of the alignment introduced in the HDR+ paper.
HDR+ code
Related
I'm trying to find if a scanned pdf form contains a signature (like making sure a check is signed).
The problem domain:
I will be receiving document packages (multi page pdf's with multiple forms). I have already put together document package classifiers that will check the package for all documents and scale the images to a common size. After that I know where the signatures should be and can scan the area of the document specifically. What I'm looking for is the best approach to making sure there is a signature present. I've considered just checking for a base threshold of dark pixels but that seems so clumsy. The trouble with signatures is that they are not really writing, more of a personal mark.
The only thing I can come up with is a machine learning method to look for loopyness? But I'm not all the familiar with machine learning and don't even know where to start with something like that. Anyone with some suggestions for practical approaches would very appreciated.
I'm coding this in Java if that's helpful at all
What you asked was very broad so there isn't a lot of information that we can give you. However, I can point you to some helpful links:
http://java-ml.sourceforge.net/ --This is a library that you can download that has lots of useful algorithms and other code to include in your program
https://www.youtube.com/playlist?list=PLiaHhY2iBX9hdHaRr6b7XevZtgZRa1PoU --this is a series that explains neural networks (something you might want to look into for your machine learning)
So a big tip I have for your algorithm is to instead of looking for how long exactly all of the loops and things are, look at all of their relative distances
"Relative distances from what?" you say. Well this is where the next tip comes in handy: instead of keeping track of the lines, keep track of the tips of the loops and the order of these points. If you then take the distance between all of them (relatively of course which means to set one of the lengths to zero). Along to keeping track of the distances, you should also keep track of the angles. You would calculate the angle ABC by taking the distance between (A,B), (B,C), and (A,C) (A,B, and C being coordinates on the xy plane) which creates a triangle between the points which allows you to use trigonometry to calculate the angle.
(I am assuming that for all of these you are also trying to detect who's signature it is of course because it actually doesn't really complicate things much at all) When trying to match up the signature detected to the stored signatures to see if they are the "same," don't make it to where the distances and angles have to be exact. Give a margin of error (like use a % range above and below). Here is a tip: Make the margin of error rather large. That way if it is written poorly, it will still be detected. This raises the chances of more than one signature being picked up. Luckily, there is a simply solution to this. Just have it run the algorithm again on the signatures that were found but with the margin of error smaller (you of course don't do this manually, the program does it). Continue decreasing the margin of error until you get only one signature remaining.
I am hoping you have ideas already for detecting where the actual signature is but check for the difference in darkness of the pixels of course. Make sure it is pretty continuous. Also take note of the fact that signatures are commonly signed in both black or blue or sometimes red and other fancy colors.
i am implementing sliding windows technique to develop photo OCR,i.e.,a rectangule of a specific size is cut from the picture and checked if it contains text or not. Then again the rectangle is shifted by some pixels. But this sliding windows technique is taking a lot of time. For example to process a picture of 1366x768 it takes 6 hours with a step size of 2 and window size of 20x25. Is there any other technique which could be helpful or how to speed up the process?
i am coding in java.
It is hard to give a specific recommendation without knowing any details of your algorithm/code. There are several potential performance improvements you could consider:
Minimize disk I/O and cache misses. You stated that a rectangle is "cut from the picture". If each "cut" is a separate read from disk, it is very inefficient and would contribute significantly to execution time. When you shift your window (by 2 pixels, it appears), most of the data in the new window is the same so try to avoid re-reading that data as much as possible.
Decrease your window size or increase your step size. This obviously affects your result but depending on the size of the characters you are trying to OCR, it might be an option.
If you are applying a convolution filter to do OCR, consider doing fast convolution via a 2D FFT of the image data.
Multithread your application, if it isn't already. While your problem is not embarrassingly parallel, it could be fairly easily multithreaded.
Sliding window approaches are brute force and, therefore, terribly slow by their nature. Perhaps you should take a look at salience-based techniques that use filters to prioritize which areas of the image to process.
Here is a paper I am somewhat familiar with: B. Draper and A. Lionelle. “Evaluation of Selective Attention under Similarity Transformations,” Vision and Image Understanding, 100:152-171, 2005
Finally, what ANN library are you using? Make sure your ANN code is doing matrix/vector operations and that they are as optimized as possible!
For the last week I've been researching and experimenting with facial recognition. The intended application is for a person to be able to look up a person's information in a database (SQL) by simply taking a picture of their face. The initial expectation was to be able to compress a face down to a key or hash and use this as the database lokup. This need not be extremely accurate as the person looking up the information can and most likely will end up doing a final comparison between the original image on file and the person standing in front of them.
OpenCV/JavaCV seems to be the obvious starting point, and the facial detection that it provides works well, however the implementation of Eigenfaces for facial recognition isn't ideal because online training by recompiling hundreds of thousands of user faces every time a new face needs to be added to the training set wouldn't work.
I am experimenting with using SURF descriptors on a face extracted using OpenCV's Haar Cascade features, and this appears to get me closer to the intended result, however I am unable to think of a way to efficiently lookup and compare roughly 30 descriptors (which are either 64 or 128 dimensional vectors) in a database. I've done some reading about LSH and Spectral Hashing algorithms, however there are no implementations to be found for Java and my math isn't strong enough to implement them myself.
Does anyone have any thoughts or ideas on how this might be accomplished, or if it is even possible?
Hashing isn't complicated, nor do you need a degree in maths.
Assuming that any 2 images will result in a fairly similar number of 'descriptors' then it only requires that you get a reasonable match with enough of them to get to a high enough confidence factor.
How specific these descriptors are determines what level of collision you can accept in your hashing algorithm.
As you have several of them, I would suggest that you don't need anything too sophisticated - after all, you probably want a level of 'fuzziness' in your search?
Start with something simple - experiment and refine. You might even find that you'll need different hashing for different descriptors - i.e. some might be more specific than others?
Hopefully some food for thought.
In particular, I want to generate a tolerance interval, for which I would need to have the values of Zx for x some value on the standard normal.
Does the Java standard library have anything like this, or should I roll my own?
EDIT: Specifically, I'm looking to do something akin to linear regression on a set of images. I have two images, and I want to see what the degree of correlation is between their pixels. I suppose this might fall under computer vision as well.
Simply calculate Pearson correlation coefficient between those two images.
You will have 3 coefficients because of R,G,B channels needs to be analyzed separately.
Or you can calculate 1 coefficient just for intensity levels of images,... or you could calculate correlation between Hue values of images after converting to HSV or HSL color space.
Do whatever your see fits :-)
EDIT: Correlation coefficient may be maximized only after scaling and/or rotating some image. This may be a problem or not - depends on your needs.
You can use the complete statistical power of R using rJava/JRI. This includes correlations between pixels and so on.
Another option is to look around at imageJ, which contains libraries for many image manipulations, mathematics and statistics. It's an application allright, but the library is useable in development as well. It comes with an extensive developers manual. On a sidenote, imageJ can be combined with R as well.
imageJ allows you to use the correct methods for finding image similarity measures, based on fourier transformations or other methods. More info can be found in Digital Image Processing with Java an ImageJ. See also this paper.
Another one is the Commons-Math. This one also contains the basic statistical tools.
See also the answers on this question and this question.
It seems you want to compare to images to see how similar they are. In this case, the first two things to try are SSD (sum of squared differences) and normalized correlation (this is closely related to what 0x69 suggests, Pearson correlation) between the two images.
You can also try normalized correlation over small (corresponding) windows in the two images and add up the results over several (all) small windows in the image.
These two are very simple methods which you can write in a few minutes.
I'm not sure however what this has to do with hypothesis testing or linear regression, you might want to edit to clarify this part of your question.
I'm looking for several methods to compare two images to see how similar they are. Currently I plan to have percentages as the 'similarity index' end-result. My program outline is something like this:
User selects 2 images to compare.
With a button, the images are compared using several different methods.
At the end, each method will have a percentage next to it indicating how similar the images are based on that method.
I've done a lot of reading lately and some of the stuff I've read seems to be incredibly complex and advanced and not for someone like me with only about a year's worth of Java experience. So far I've read about:
The Fourier Transform - im finding this rather confusing to implement in Java, but apparently the Java Advanced Imaging API has a class for it. Though I'm not sure how to convert the output to an actual result
SIFT algorithm - seems incredibly complex
Histograms - probably the easiest out of all mentioned so far
Pixel grabbing - seems viable but if theres a considerable amount of variation between the two images it doesn't look like it's going to produce any sort of accurate result. I might be wrong?
I also have the idea of pre-processing an image using a Sobel filter first, then comparing it. Problem is the actual comparing part.
So yeah I'm looking to see if anyone has ideas for comparing images in Java. Hoping that there are people here that have done similar projects before. I just want to get some input on viable comparison techniques that arent too hard to implement in Java.
Thanks in advance
Fourier Transform - This can be used to efficiently can compute the cross-correlation, which will tell you how to align the two images and how similar they are, when they are optimally aligned.
Sift descriptors - These can be used to compare local features. They are often used for correspondence analysis and object recognition. (See also SURF)
Histograms - The normalized cross-correlation often yields good results for comparing images on a global level. But since you are just comparing color distributions you could end up declaring an outdoor scene with lots of snow as similar to an indoor scene with lots of white wallpaper...
Pixel grabbing - No idea what this is...
You can get a good overview from this paper. Another field you might to look into is content based image retrieval (CBIR).
Sorry for not being Java specific. HTH.
As a better alternative to simple pixel grabbing, try SSIM. It does require that your images are essentially of the same object from the same angle, however. It's useful if you're comparing images that have been compressed with different algorithms, for example (e.g. JPEG vs JPEG2000). Also, it's a fairly simple approach that you should be able to implement reasonably quickly to see some results.
I don't know of a Java implementation, but there's a C++ implementation using OpenCV. You could try to re-use that (through something like javacv) or just write it from scratch. The algorithm itself isn't that complicated anyway, so you should be able to implement it directly.