I know that this question was asked, but it has no distinct answer.
So, what I've found is some example here : FFT spectrum analysis
Where I can transform my array of doubles with FFT class
RealDoubleFFT transformer;
int blockSize= */2048;
short[] buffer = new short[blockSize];
double[] toTransform = new double[blockSize];
bufferReadResult = audioRecord.read(buffer, 0, blockSize);
for (int i = 0; i < blockSize && i < bufferReadResult; i++) {
toTransform[i] = (double) buffer[i] / 32768.0; // signed 16 bit
}
transformer.ft(toTransform);
so now I don't know how to get a frequency
I wrote such method :
public static int calculateFFTFrequency(double[] audioData){
float sampleRate = 44100;
int numSamples = audioData.length;
double max = Double.MIN_VALUE;
int index = 0;
for (int i = 0; i< numSamples -1; i++){
if (audioData[i] > max) {
max = audioData[i];
index = i;
}
}
float freq = (sampleRate / (float) numSamples * (float) index) * 2F;
return (int)freq;
}
I try to implement a formula, but it doesn't return me anything good - some wild numbers
I tried zero passing as well :
public static int calculateFrequency(short [] audioData){
int sampleRate = 44100;
int numSamples = audioData.length;
int numCrossing = 0;
for (int p = 0; p < numSamples-1; p++)
{
if ((audioData[p] > 0 && audioData[p + 1] <= 0) ||
(audioData[p] < 0 && audioData[p + 1] >= 0))
{
numCrossing++;
}
}
float numSecondsRecorded = (float)numSamples/(float)sampleRate;
float numCycles = numCrossing/2;
float frequency = numCycles/numSecondsRecorded;
return (int)frequency;
}
But in zero passing method if I play "A" note on piano it shows me 430 for a moment (which is close to A) and then start to show some wild numbers when the sound fades - 800+ , 1000+ , etc.
Can somebody help me how to get more or less actual frequency from the mic?
You should test your solution using a generated stream rather than a mic, then testing if the frequency detected is what you expect. Then you can do real life tests with mic, you should analyze the data collected by mic by yourself in case of any issues. There could be non audible sounds in your environment that could cause some strange results. When the sound fades there could be some harmonical sounds and these harmonicals can become lauder than the base sound. There's a lot of things to be considered when processing sounds from real environment.
What you hear from a piano is a pitch, not just a spectral frequency. They are not the same thing. Pitch is a psycho-acoustic phenomena, depending more on periodicity, not just the spectral peak. A bare FFT reports spectral frequency magnitudes, which can be composed of overtones, harmonics, and other artifacts, and may or may not include the fundamental pitch frequency.
So what you may want to use instead of an FFT is a pitch detection/estimation algorithm, which is a bit more complicated than just picking a peak magnitude out of an FFT.
Related
I have written a Java programm for peer to peer voice chat, but in order to keep the traffic as low as possible i would like to analyze the captured data and make parts with low volume completely silent.
The problem is, that I have no idea how to get the volume from the byte array and how to make parts silent.
You have to slide a time window over your data (say .25 seconds worth of data) and compute the root mean square to see if that period of time is silent or not. Exactly how many bytes constitues .25 seconds depends on the audio rate that your sample is.
So assuming you have you data in byte[] audioData, and that audio data is signed 8 bit PCM data, you'd compute the RMS like below... and then use a value like 1000 as your silence threshold.
/** Computes the RMS volume of a group of signal sizes */
public double volumeRMS(int start, int length) {
long sum = 0;
int end = start + length;
int len = length;
if (end > audioData.length) {
end = audioData.length;
len = end - start;
}
if (len == 0) {
return 0;
}
for (int i=start; i<end; i++) {
sum += audioData[i];
}
double average = (double)sum/len;
double sumMeanSquare = 0;;
for (int i=start; i<end; i++) {
double f = audioData[i] - average;
sumMeanSquare += f * f;
}
double averageMeanSquare = sumMeanSquare/len;
double rootMeanSquare = Math.sqrt(averageMeanSquare);
return rootMeanSquare;
}
I am trying to make a simple pitch detection application for an Android phone. I have gotten the phone to display a graph of the autocorrelation values I have computed, which are stored in a one dimensional array of doubles. Now I need to figure out how to detect repeating patterns within the array. Here is a screenshot of the autocorrelation graph with me humming a steady pitch:
I tried implementing the recursive peak-finding algorithm for 1D arrays given in this slide deck: http://courses.csail.mit.edu/6.006/spring11/lectures/lec02.pdf but I got out of memory errors on the Android.
Next I tried implementing something like this algorithm for finding the second derivative: https://stackoverflow.com/a/3869172 but the autocorrelation values coming from the phone are so jittery that it finds too many minima and maxima.
What I need to figure out how to do is to apply some kind of filter to the autocorrelation data to smooth it out but I suck at math and have no idea what to do. I tried rounding the autocorrelation values to only a few decimal places but I didn't get the results I was looking for.
Basically I need help in figuring out how I can find the overall maxima (actually just the first one would probably be ok) of a repeating pattern. In the screenshot above, the pattern is a tall peak followed by two shorter peaks. I need to know when the second tall peak happens so that I can calculate the pitch.
You are trying to estimate the frequency of the peaks of amplitude in the sample data. You can do this without having to manually find estimate the peaks and then work out frequency. Instead you can use a Fast Fourier Transform, this transforms from a graph of amplitude against time into a graph of frequency against time. There is a good description of the concept in general here http://en.wikipedia.org/wiki/Fast_Fourier_transform
...and there are several Java libraries that implement the transform including
Apache Commons Math - http://commons.apache.org/proper/commons-math/apidocs/org/apache/commons/math3/transform/FastFourierTransformer.html
and
JTransform - https://sites.google.com/site/piotrwendykier/software/jtransforms
To answer my own question, this is what I ended up doing. (Sorry it took me so long to come back to this question to post the answer.)
double frequency = findFrequency(lowPassFilter(signal));
private double findFrequency(double[] signal) {
int[] signs = new int[signal.length];
for (int i = 0; i < signal.length - 1; i++) {
double diff = signal[i+1] - signal[i];
if (diff < 0) {
signs[i] = -1;
} else if (diff == 0) {
signs[i] = 0;
} else {
signs[i] = 1;
}
}
int[] secondDerivatives = new int[signs.length];
for (int i = 0; i < signs.length - 1; i++) {
secondDerivatives[i] = signs[i+1] - signs[i];
}
double biggestSoFar = 0.0;
int indexOfBiggestSoFar = 0;
for (int i = 0; i < secondDerivatives.length; i++) {
if (secondDerivatives[i] == -2 && signal[i] > biggestSoFar) {
biggestSoFar = signal[i];
indexOfBiggestSoFar = i;
}
}
return 1 / (double)indexOfBiggestSoFar * AudioListener.SAMPLE_RATE;
}
private double[] lowPassFilter(double[] signal) {
double alpha = 0.15;
for (int i = 1; i < signal.length; i++ ) {
signal[i] = signal[i] + alpha * (signal[i] - signal[i-1]);
}
return signal;
}
I've modulated a carrier frequency signal with my data using FSK like this:
double SAMPLING_TIME = 1.0 / 441000 // 44khz
int SAMPLES_PER_BIT = 136;
int ENCODING_SAMPLES_PER_BIT = SAMPLES_PER_BIT / 2;
int duration = ENCODING_SAMPLES_PER_BIT * SAMPLING_TIME;
public double[] encode(int[] bits) {
for (int i = 0; i < bits.length; i++) {
int freq = FREQUENCY_LOW;
if (bits[i] > 1)
freq = FREQUENCY_HIGH;
bitArray = generateTone(freq, duration);
message = bitArray;
}
return message;
}
private double[] generateTone(int frequency, double duration) {
int samplingRate = 1/SAMPLING_TIME; // Hz
int numberOfSamples = (int) (duration * samplingRate);
samplingTime = 2 * SAMPLING_TIME;
double[] tone = new double[numberOfSamples];
for (int i = 0; i < numberOfSamples; i++) {
double y = Math.sin(2 * Math.PI * frequency * i * SAMPLING_TIME);
tone[i] = y * CARRIER_AMPLITUDE;
}
return tone;
}
Clearly, I'm sending FREQUENCY_LOW for ZERO and FREQUENCY_HIGH for 1.
Now how do I demodulate it using FFT? I'm interested in sampling magnitudes (presence and absence) of FREQUENCY_LOW, FREQUENCY_HIGH throughout the time.
I only know basics of FFT, I was starting to write this but it doesn't make sense:
private void decode(byte[] tone, int length) {
float[] input = new float[FFT_SIZE*2]; // not sure what size? shouldn't this be buffer?
for(int i=0;i<length;i++){
input[i]=tone[i];
}
FloatFFT_1D fft = new FloatFFT_1D(FFT_SIZE);
fft.realForward(input);
}
Can someone help with code?
You can use overlapping sliding windows for your FFTs, with the window and FFT the same length as that of your data bits. Then look for magnitude peaks for your 1's and 0's in the appropriate FFT result bins across these windows. You will also need some synchronization logic for runs of 1's and 0's.
Another DSP techniques that may be less compute intensive is to do quadrature demodulation for your two frequencies and low-pass filter the result before feeding it to the synchronization logic and bit detector. Yet another possibility is two sliding Goertzel filters.
I'm trying to calculate the Mean Difference average of a set of data. I have two (supposedly equivalent) formulas which calculate this, with one being more efficient (O^n) than the other (O^n2).
The problem is that while the inefficient formula gives correct output, the efficient one does not. Just by looking at both formulas I had a hunch that they weren't equivalent, but wrote it off because the derivation was made by a statician in a scientific journal. So i'm assuming the problem is my translation. Can anyone help me translate the efficient function properly?
Inefficient formula:
Inefficient formula translation (Java):
public static double calculateMeanDifference(ArrayList<Integer> valuesArrayList)
{
int valuesArrayListSize = valuesArrayList.size();
int sum = 0;
for(int i = 0; i < valuesArrayListSize; i++)
{
for(int j = 0; j < valuesArrayListSize; j++)
sum += (i != j ? Math.abs(valuesArrayList.get(i) - valuesArrayList.get(j)) : 0);
}
return new Double( (sum * 1.0)/ (valuesArrayListSize * (valuesArrayListSize - 1)));
}
Efficient derived formula:
where (sorry, don't know how to use MathML on here):
x(subscript i) = the ith order statistic of the data set
x(bar) = the mean of the data set
Efficient derived formula translation (Java):
public static double calculateMean(ArrayList<Integer> valuesArrayList)
{
double sum = 0;
int valuesArrayListSize = valuesArrayList.size();
for(int i = 0; i < valuesArrayListSize; i++)
sum += valuesArrayList.get(i);
return sum / (valuesArrayListSize * 1.0);
}
public static double calculateMeanDifference(ArrayList<Integer> valuesArrayList)
{
double sum = 0;
double mean = calculateMean(valuesArrayList);
int size = valuesArrayList.size();
double rightHandTerm = mean * size * (size + 1);
double denominator = (size * (size - 1)) / 2.0;
Collections.sort(valuesArrayList);
for(int i = 0; i < size; i++)
sum += (i * valuesArrayList.get(i) - rightHandTerm);
double meanDifference = (2 * sum) / denominator;
return meanDifference;
}
My data set consists of a collection of integers each having a value bounded by the set [0,5].
Randomly generating such sets and using the two functions on them gives different results. The inefficient one seems to be the one producing results in line with what is being measured: the absolute average difference between any two values in the set.
Can anyone tell me what's wrong with my translation?
EDIT: I created a simpler implementation that is O(N) provided the all your data has values limited to a relatively small set.The formula sticks to the methodology of the first method and thus, gives identical results to it (unlike the derived formula). If it fits your use case, I suggest people use this instead of the derived efficient formula, especially since the latter seems to give negative values when N is small).
Efficient, non-derived translation (Java):
public static double calculateMeanDifference3(ArrayList<Integer> valuesArrayList)
{
HashMap<Integer, Double> valueCountsHashMap = new HashMap<Integer, Double>();
double size = valuesArrayList.size();
for(int i = 0; i < size; i++)
{
int currentValue = valuesArrayList.get(i);
if(!valueCountsHashMap.containsKey(currentValue))
valueCountsHashMap.put(currentValue, new Double(1));
else
valueCountsHashMap.put(currentValue, valueCountsHashMap.get(currentValue)+ 1);
}
double sum = 0;
for(Map.Entry<Integer, Double> valueCountKeyValuePair : valueCountsHashMap.entrySet())
{
int currentValue = valueCountKeyValuePair.getKey();
Double currentCount = valueCountKeyValuePair.getValue();
for(Map.Entry<Integer, Double> valueCountKeyValuePair1 : valueCountsHashMap.entrySet())
{
int loopValue = valueCountKeyValuePair1.getKey();
Double loopCount = valueCountKeyValuePair1.getValue();
sum += (currentValue != loopValue ? Math.abs(currentValue - loopValue) * loopCount * currentCount : 0);
}
}
return new Double( sum/ (size * (size - 1)));
}
Your interpretation of sum += (i * valuesArrayList.get(i) - rightHandTerm); is wrong, it should be sum += i * valuesArrayList.get(i);, then after your for, double meanDifference = ((2 * sum) - rightHandTerm) / denominator;
Both equations yields about the same value, but they are not equal. Still, this should help you a little.
You subtract rightHandTerm on each iteration, so it gets [over]multiplied to N.
The big Sigma in the nominator touches only (i x_i), not the right hand term.
One more note: mean * size == sum. You don't have to divide sum by N and then remultiply it back.
I am currently trying to implement some code using Android to detect when a number of specific audio frequency ranges are played through the phone's microphone. I have set up the class using the AudioRecord class:
int channel_config = AudioFormat.CHANNEL_CONFIGURATION_MONO;
int format = AudioFormat.ENCODING_PCM_16BIT;
int sampleSize = 8000;
int bufferSize = AudioRecord.getMinBufferSize(sampleSize, channel_config, format);
AudioRecord audioInput = new AudioRecord(AudioSource.MIC, sampleSize, channel_config, format, bufferSize);
The audio is then read in:
short[] audioBuffer = new short[bufferSize];
audioInput.startRecording();
audioInput.read(audioBuffer, 0, bufferSize);
Performing an FFT is where I become stuck, as I have very little experience in this area. I have been trying to use this class:
FFT in Java and Complex class to go with it
I am then sending the following values:
Complex[] fftTempArray = new Complex[bufferSize];
for (int i=0; i<bufferSize; i++)
{
fftTempArray[i] = new Complex(audio[i], 0);
}
Complex[] fftArray = fft(fftTempArray);
This could easily be me misunderstanding how this class is meant to work, but the values returned jump all over the place and aren't representative of a consistent frequency even in silence. Is anyone aware of a way to perform this task, or am I overcomplicating matters to try and grab only a small number of frequency ranges rather than to draw it as a graphical representation?
First you need to ensure that the result you are getting is correctly converted to a float/double. I'm not sure how the short[] version works, but the byte[] version only returns the raw byte version. This byte array then needs to be properly converted to a floating point number. The code for the conversion should look something like this:
double[] micBufferData = new double[<insert-proper-size>];
final int bytesPerSample = 2; // As it is 16bit PCM
final double amplification = 100.0; // choose a number as you like
for (int index = 0, floatIndex = 0; index < bytesRecorded - bytesPerSample + 1; index += bytesPerSample, floatIndex++) {
double sample = 0;
for (int b = 0; b < bytesPerSample; b++) {
int v = bufferData[index + b];
if (b < bytesPerSample - 1 || bytesPerSample == 1) {
v &= 0xFF;
}
sample += v << (b * 8);
}
double sample32 = amplification * (sample / 32768.0);
micBufferData[floatIndex] = sample32;
}
Then you use micBufferData[] to create your input complex array.
Once you get the results, use the magnitudes of the complex numbers in the results. Most of the magnitudes should be close to zero except the frequencies that have actual values.
You need the sampling frequency to convert the array indices to such magnitudes to frequencies:
private double ComputeFrequency(int arrayIndex) {
return ((1.0 * sampleRate) / (1.0 * fftOutWindowSize)) * arrayIndex;
}