I have an image that I want to transform to the frequency domain using FFT, there seems to be a lack of libraries for this for Java but I have found two. One is JTransforms and the other was less well known and doesn't have a name.
With the less well known one the 2D could only have length vales of powers of two but had simple to use methods like FastFourierTransform.fastFT(real, imaginary, true); with the real being the 2D array of doubles full of every pixel values and the imaginary part being a 2D array the same size full of zeroes. The Boolean value would depend on a forward or reverse transform. This made sense to me and it worked except for the power of two requirement which ruined any transform I did (I initially added black space around the image to fit it to the closest power of two), what I am struggling with is working out how to use the equivalent methods for JTransforms and would appreciate any guidance in doing so. I will state what I am currently doing.
I believe the relevant class would be DoubleFFT_2D, its constructor takes a number of rows and columns which I would assume to be the width and height of my image. Because my image has no imaginary parts I think I can use doubleFFT.realForwardFull(real); which treats imaginary parts as zero and pass the real 2D array full of pixels. Unfortunately this doesn't work at all. The JavaDoc states the input array must be of size rows*2*columns, with only the first rows*columns elements filled with real data But I don't see how this related to my image and what I would have to do to meet this requirement.
Sorry about the lengthy and poor explanation, if any additional information is needed I would be happy to provide it.
JTransforms Library and Docs can be found here: https://sites.google.com/site/piotrwendykier/software/jtransforms
It's too bad the documentation for JTransforms isn't available online other than a zipped download. It's very complete and helpful, you should check it out!
To answer your question: DoubleFFT_2D.realForwardFull(double[][] a) takes an array of real numbers (your pixels). However, the result of the FFT will have two output values for each input value - a the real and the imaginary part of each frequency bin. This is why your input array needs to be twice as big as the actual image array, with half of it empty / filled with zeroes.
Note that all the FFT functions use a not only for input, but also for output - this means any image data in there will be lost, so it might be desirable to copy to a different / larger array anyway!
The easy and obvious fix for your scenario would be to use DoubleFFT_2D.realForward(double[][] a) instead. This one will only calculate the positive spectrum, because the negative side will be symmetrical to it. This is because your input values are real.
Also, check out the RealFFTUtils_2D class in JTransforms, which will make it a lot easier for you to retrieve your results from the array afterwards :)
Related
This is supposed to be for an android app, so the language in question is obviously Java.
I'm trying to record some audio and get the dominant frequency. This is for a very specific purpose, and the frequencies I need to be detected are pure sounds made by another device. I have the recording part done, so the only thing that I need to do is calculate the frequency from the buffer it generates.
I know I'm supposed to use something called FFT, so I put these into my project: http://introcs.cs.princeton.edu/java/97data/FFT.java, and http://introcs.cs.princeton.edu/java/97data/Complex.java.html
I know there are many questions about this, but none of them give an answer that I can understand. Others have broken links.
Anyone know how to do this, and explain in a relatively simple manner?
Generally a DFT (FFT included) implementation will take N time-domain samples (your recording) and produce N/2 complex values in the frequency domain. The angle of the complex value represents the phase and the absolute value of it represents the amplitude. Usually the values output will be ordered from lowest frequency to highest frequency.
Some implementations may output N complex values, but the extra values are redundant unless your input contains complex values. It should not in your case. This is why many implementations input real values and output N/2 complex values, as this is the most common use of FFT.
So, you will want to calculate the absolute value of the output since the amplitude is what you are interested in. The absolute value of a complex number is the square root of the sum of the square of it's real and the square of it's complex component.
The exact frequencies of each value will depend on the number of samples of input and the interval between the samples. The frequency of value at position i (assuming i goes from 0 to N/2 - 1) will be i * (sampling frequency) / N.
This is assuming your N is even, rather than trying to explain the case of N being odd I'll recommend you keep N even for simplicity. For the case of FFT N will always be a power of two so N will always be even anyway.
If you're looking for a tone over a minimum time T then I'd also recommend processing the input in blocks of T/2 size.
Fourier transforms are a mathematical technique that lets you go back and forth between time and frequency domains for time-dependent signals.
FFT is a computer algorithm for calculating discrete transforms quickly and efficiently.
You'll take a sample of your time signal and apply FFT to it to get the amplitude versus frequency for the sample.
It's not an easy topic if you don't have the mathematical background. It assumes a good knowledge of trigonometry (sines and cosines), functions, and calculus. If you don't have that, it'll be difficult to read and understand any reference you can find.
If you don't have that background, do your best to treat a library FFT function as a black box and use what it gives back.
I'm trying to find if a scanned pdf form contains a signature (like making sure a check is signed).
The problem domain:
I will be receiving document packages (multi page pdf's with multiple forms). I have already put together document package classifiers that will check the package for all documents and scale the images to a common size. After that I know where the signatures should be and can scan the area of the document specifically. What I'm looking for is the best approach to making sure there is a signature present. I've considered just checking for a base threshold of dark pixels but that seems so clumsy. The trouble with signatures is that they are not really writing, more of a personal mark.
The only thing I can come up with is a machine learning method to look for loopyness? But I'm not all the familiar with machine learning and don't even know where to start with something like that. Anyone with some suggestions for practical approaches would very appreciated.
I'm coding this in Java if that's helpful at all
What you asked was very broad so there isn't a lot of information that we can give you. However, I can point you to some helpful links:
http://java-ml.sourceforge.net/ --This is a library that you can download that has lots of useful algorithms and other code to include in your program
https://www.youtube.com/playlist?list=PLiaHhY2iBX9hdHaRr6b7XevZtgZRa1PoU --this is a series that explains neural networks (something you might want to look into for your machine learning)
So a big tip I have for your algorithm is to instead of looking for how long exactly all of the loops and things are, look at all of their relative distances
"Relative distances from what?" you say. Well this is where the next tip comes in handy: instead of keeping track of the lines, keep track of the tips of the loops and the order of these points. If you then take the distance between all of them (relatively of course which means to set one of the lengths to zero). Along to keeping track of the distances, you should also keep track of the angles. You would calculate the angle ABC by taking the distance between (A,B), (B,C), and (A,C) (A,B, and C being coordinates on the xy plane) which creates a triangle between the points which allows you to use trigonometry to calculate the angle.
(I am assuming that for all of these you are also trying to detect who's signature it is of course because it actually doesn't really complicate things much at all) When trying to match up the signature detected to the stored signatures to see if they are the "same," don't make it to where the distances and angles have to be exact. Give a margin of error (like use a % range above and below). Here is a tip: Make the margin of error rather large. That way if it is written poorly, it will still be detected. This raises the chances of more than one signature being picked up. Luckily, there is a simply solution to this. Just have it run the algorithm again on the signatures that were found but with the margin of error smaller (you of course don't do this manually, the program does it). Continue decreasing the margin of error until you get only one signature remaining.
I am hoping you have ideas already for detecting where the actual signature is but check for the difference in darkness of the pixels of course. Make sure it is pretty continuous. Also take note of the fact that signatures are commonly signed in both black or blue or sometimes red and other fancy colors.
I have an image file, and I need to determine if a specified area of this image contains a signature. Or to put it in end-user terms, "Has this document been signed?"
What I have done so far is to examine all the pixels contained in the area, to calculate an average "darkness", and compare that to a reference value. If the difference in darkness exceeds some threshold, then I consider it signed.
The problem with this (admittedly simplistic) approach is that because the pixels of the signature itself are such a small fraction of area, I have to use a very low darkness threshold, which results in a large number of false positives. I can't distinguish a real signature from stray markings, smudges, fax artifacts, etc.
To be clear...I'm not trying to match any specific signature or set of signatures. That is, I don't care who signed it, only whether it is signed.
Is anyone aware of a Java library that can do this, or of a better approach to this problem than what I am currently doing?
EDIT:
This is an example of the kinds of images I am working with. This document would be faxed to the recipient, signed and faxed back. It won't be this clean-looking by the time I need to look for a signature.
I do not know of any simple solutions. You can wrap over queXF or write something similar in Java. This paper talks about color code algorithm to recognize signatures.
This is what I believe can be done (although not a very good solution) but may still work. It would involve a bit of machine learning. I am assuming that your image does not contain hand written text and its just an image.
First thing to do would be to create a dataset of images which contain a signature and those which do not. The positive samples of the dataset should only contain signatures (you can learn a classifier for multiple aspect ratios) and negative samples should contain random images of the same aspect ratio/dimension. Now, you can compute some feature over these samples (HoG can be used as a feature, although I do not claim it is the best one to use for this application) and learn a SVM for each aspect ratio.
The next step would be to slide a detection window (of different aspect ratios) throughout the image and use the multiple SVMs you have learnt and check if any of them gives a positive response.
Although this approach may not work always, but should give a decent amount of accuracy. The more data you will use to learn, the better the results would get (and if you can come up with a good feature vector to represent a signature, it would help you case even further)
I'm trying to write a high/low pass image filter using jtransforms. Everything is working very nicely in the sense that I can transform an image using the complexForward method of the FloatFFT_2D class, and then come back to exactly the same picture using the complexInverse method. I'm using a float[] input rather than a float[][].
However, to apply the filter I need to remove some of the frequency components in between these two stages. My problem is that I don't know what the output looks like, or in other words, where within the output array the different spatial frequencies are stored. Is a[0] the DC value, for example?
The documentation isn't particularly forthcoming on this, so I'd be grateful if anyone knew the answer!
Figured it out - the low frequency components are in the corners. So if you need the low frequency components to be at the centre, a java version of fftshift needs to be implemented as explained in the link below:
http://www.mathworks.co.uk/help/matlab/ref/fftshift.html
The question is a bit broad.
Here is what I have done:
I have a method for applying the fft. I'm not going to post it because whether it is correct or incorrect is not really the point here.
I run an image through the method and then try to display what comes out as two images of the sames size, one for the real part and one for the imaginary part.
This seems to work fine except that the grayscale values that come out of my method are usually much larger than 255 and therefore I'm not sure what I'm seeing.
I then take the raw result (not whatever the pixel values I display are, since I assume they are modified somehow to fit between 0 and 255) and run it through the same method as before but with a sign change to achieve the ifft.
I then try to display this as well. Again, the raw values are much larger than 255 for the most part.
My question boils down to:
a.) do i have to do some scaling on the fft to get it to fit between 0 and 255?
b.) do i have to reverse this scaling when I do the ifft?
c.) Is there any translation I have to do on the fft before I apply the ifft?
Part c arises from the fact that I have read some things which talk about centering the corners of the fft but I'm not really certain what this means.
A lesser question, part d, would be that if I apply the 2d fft on the original image by first applying the 1d fft to all the rows and then again to all the columns, do i need to apply the ifft in the same order or do i need to reverse the order.
I think that's all for now. I have been doing a lot of looking for answers but cant seem to find much so any help is appreciated.
EDIT: I added some images, maybe they will help. The first is the original image, the second is the result of my fft method (magnitude and imaginary component) and the third is the result of the ifft on the intermediate image.
EDIT2: Updated the images to ones from newer method.
People usually don't find it very useful to view the real and imaginary parts separately, but instead view the magnitude, and possibly the phase, but usually just the magnitude.
a) In general, yes, you will need to apply a scaling regardless of which components you're viewing. There are scaling relations between the total power of the image and it's FFT, but not the individual components. Also, you'll often want to do something like take the log of the data, or ignore the zero component, etc, so it's best just to do the scaling on your own.
b) In part a, you should do the scaling for visualizing, and don't scale the actually FFT. You should take the IFFT of the original FFT.
c) Depending on your FFT routines, you may need to divide by a factor of 2pi or the number of points in the sample, but this depends on how your FFT routines work. The docs should clarify this. As a start, just see if there's a factor of 2pi between what you start with and end with.
Answers to your four questions:
a. Do you have to scale the results of FFT to view them? Yes. You need to take magnitude then scale down to values between 0 and 255.
b. Do you have to reverse scaling before IFFT. Answer to A is only if you want to view the results of FFT. You cannot IFFT the scaled numbers. Use the original numbers.
c. Do translation between FFT and IFFT? No.
d. Does the order of Row vs Col during FFT matter? No. The results of FFT are a set of real and imaginary numbers. It is a deterministic result. You can IFFT in either order.
One of the key aspects that you may be having trouble with is the difference between the math and the visualization. The IFFT work on float or double real and imaginary numbers. The image expects integers between 0 and 255. You have to handle this conversion in code. You indicated that you thought it was "modified somehow". Safer to perform this conversion yourself.
Finally ditto on the tom10 answer. You may have to scale the results of the IFFT. It depends on the implementation of the FFT and IFFT.