I make a simple sound equalizer that operates in frequency domain and lets user to adjust frequencies in sound by using 4 sliders. The first one responsible for 0 - 5kHz, the fourth one for 15-20kHz.
Steps are as follows:
I read wav file and store it in float array
I perform complex fft on that array (separately for left and right channel)
I multiply real and imaginary parts of bins representing 0-5kHz frequencies (both positive and negative) by 1.1 3.981 to increase these low frequencies by 10% 12dB in the final sound.
I perform ifft on array
I alternate real parts of left and right channels (returned by ifft) to create the final audio
The problem is that after this process the sound is distorted. It sounds like the speakers were not plugged in correctly. I found that if I divide values returned by ifft by arbitrary constant then the final sound is right, but is much quieter. I make the division in time domain, on the results from ifft.
The problem doesn't occur if I multiply frequencies by a number less than 1. So if frequencies are attenuated no further division in time domain is needed.
I suppose there is a mistake in the whole process. But if all steps are fine, how should I deal with distorted sound? Is dividing in time domain a proper solution? What number should I use to divide the results then so the sound is not distorted?
EDIT
This is the code I use to perform presented steps. I use Apache Commons math implementation of FFT and SimpleAudioConversion class taken from there http://stackoverflow.com/a/26824664/2891664
// read file and store playable content in byte array
File file = new File("/home/kamil/Downloads/Glory.wav");
AudioInputStream in = AudioSystem.getAudioInputStream(file);
AudioFormat fmt = in.getFormat();
byte[] bytes = new byte[in.available()];
int result = in.read(bytes);
// convert bytes to float array
float[] samples = new float[bytes.length * 8 / fmt.getSampleSizeInBits()];
int validSamples = SimpleAudioConversion.decode(bytes, samples, result, fmt);
// find nearest power of 2 to zero-pad array in order to use fft
int power = 0;
while (Math.pow(2, power) < samples.length / 2)
power++;
// divide data into left and right channels
double[][] left = new double[2][(int) Math.pow(2, power)];
double[][] right = new double[2][(int) Math.pow(2, power)];
for (int i = 0; i < samples.length / 2; i++) {
left[0][i] = samples[2 * i];
right[0][i] = samples[2 * i + 1];
}
//fft
FastFourierTransformer.transformInPlace(left, DftNormalization.STANDARD, TransformType.FORWARD);
FastFourierTransformer.transformInPlace(right, DftNormalization.STANDARD, TransformType.FORWARD);
// here I amplify the 0-4kHz frequencies by 12dB
// 0-4kHz is 1/5 of whole spectrum, and since there are negative frequencies in the array
// I iterate over 1/10 and multiply frequencies on both sides of the array
for (int i = 1; i < left[0].length / 10; i++) {
double factor = 3.981d; // ratio = 10^(12dB/20)
//positive frequencies 0-4kHz
left[0][i] *= factor;
right[0][i] *= factor;
left[1][i] *= factor;
right[1][i] *= factor;
// negative frequencies 0-4kHz
left[0][left[0].length - i] *= factor;
right[0][left[0].length - i] *= factor;
left[1][left[0].length - i] *= factor;
right[1][left[0].length - i] *= factor;
}
//ifft
FastFourierTransformer.transformInPlace(left, DftNormalization.STANDARD, TransformType.INVERSE);
FastFourierTransformer.transformInPlace(right, DftNormalization.STANDARD, TransformType.INVERSE);
// put left and right channel into array
float[] samples2 = new float[(left[0].length) * 2];
for (int i = 0; i < samples2.length / 2; i++) {
samples2[2 * i] = (float) left[0][i];
samples2[2 * i + 1] = (float) right[0][i];
}
// convert back to byte array which can be played
byte[] bytes2 = new byte[bytes.length];
int validBytes = SimpleAudioConversion.encode(samples2, bytes2, validSamples, fmt);
You may listen to the sound here
https://vocaroo.com/i/s095uOJZiewf
If you amplify in either domain, you can potentially end up clipping the signal (which can sound horrible).
So you might need to check your ifft results to see if any sample values exceed the allowed range (usually -32768 to 32768, or -1.0 to 1.0), that your audio system allows. The way to avoid any found clipping is to either reduce the gain applied to the fft bins, or reduce the amplitude of the original input signal or the total ifft result.
The search term for a dynamic gain control process is AGC (Automatic Gain Control), which is non-trivial to do well.
e.g. if the volume for any particular frequency bin is already at "10", your computer's knob doesn't have an "11".
Related
I know that this question was asked, but it has no distinct answer.
So, what I've found is some example here : FFT spectrum analysis
Where I can transform my array of doubles with FFT class
RealDoubleFFT transformer;
int blockSize= */2048;
short[] buffer = new short[blockSize];
double[] toTransform = new double[blockSize];
bufferReadResult = audioRecord.read(buffer, 0, blockSize);
for (int i = 0; i < blockSize && i < bufferReadResult; i++) {
toTransform[i] = (double) buffer[i] / 32768.0; // signed 16 bit
}
transformer.ft(toTransform);
so now I don't know how to get a frequency
I wrote such method :
public static int calculateFFTFrequency(double[] audioData){
float sampleRate = 44100;
int numSamples = audioData.length;
double max = Double.MIN_VALUE;
int index = 0;
for (int i = 0; i< numSamples -1; i++){
if (audioData[i] > max) {
max = audioData[i];
index = i;
}
}
float freq = (sampleRate / (float) numSamples * (float) index) * 2F;
return (int)freq;
}
I try to implement a formula, but it doesn't return me anything good - some wild numbers
I tried zero passing as well :
public static int calculateFrequency(short [] audioData){
int sampleRate = 44100;
int numSamples = audioData.length;
int numCrossing = 0;
for (int p = 0; p < numSamples-1; p++)
{
if ((audioData[p] > 0 && audioData[p + 1] <= 0) ||
(audioData[p] < 0 && audioData[p + 1] >= 0))
{
numCrossing++;
}
}
float numSecondsRecorded = (float)numSamples/(float)sampleRate;
float numCycles = numCrossing/2;
float frequency = numCycles/numSecondsRecorded;
return (int)frequency;
}
But in zero passing method if I play "A" note on piano it shows me 430 for a moment (which is close to A) and then start to show some wild numbers when the sound fades - 800+ , 1000+ , etc.
Can somebody help me how to get more or less actual frequency from the mic?
You should test your solution using a generated stream rather than a mic, then testing if the frequency detected is what you expect. Then you can do real life tests with mic, you should analyze the data collected by mic by yourself in case of any issues. There could be non audible sounds in your environment that could cause some strange results. When the sound fades there could be some harmonical sounds and these harmonicals can become lauder than the base sound. There's a lot of things to be considered when processing sounds from real environment.
What you hear from a piano is a pitch, not just a spectral frequency. They are not the same thing. Pitch is a psycho-acoustic phenomena, depending more on periodicity, not just the spectral peak. A bare FFT reports spectral frequency magnitudes, which can be composed of overtones, harmonics, and other artifacts, and may or may not include the fundamental pitch frequency.
So what you may want to use instead of an FFT is a pitch detection/estimation algorithm, which is a bit more complicated than just picking a peak magnitude out of an FFT.
I can't for the life of my wrap my head around this seemingly easy problem.
I am trying to create a sine wave with upper and lower bounds for the amplitude (ie. highest point is 3 and lowest point is 0.4)
Using regular math I am able to get a sine wave in an array from 1 to -1 but I don't know how to change those bounds.
static int MAX_POINTS = 100;
static int CYCLES = 1;
static double[] list = new double[100];
public static void SineCurve()
{
double phaseMultiplier = 2 * Math.PI * CYCLES / MAX_POINTS;
for (int i = 0; i < MAX_POINTS; i++)
{
double cycleX = i * phaseMultiplier;
double sineResult = Math.sin(cycleX);
list[i]= sineResult;
}
for(int i=0;i<list.length;i++){
System.out.println(list[i]);
}
}
Any tips would be greatly appreciated.
The amplitude (multiplier of sin(x) value) is half the difference between the highest and lowest values you want. In your case
amplitude = (3 - 0.4)/2
which is 1.3. Then zero offset is the lowest value plus the amplitude, which makes it 1.7 in your case.
The equation you want to graph is then
1.3 * sin(x) + 1.7
I am currently using the gdx library com.badlogic.gdx.audio.analysis.FFT and the method:
private float[] fft(int N, int fs, float[] array) {
float[] fft_cpx, tmpr, tmpi;
float[] res = new float[N / 2];
// float[] mod_spec =new float[array.length/2];
float[] real_mod = new float[N];
float[] imag_mod = new float[N];
double[] real = new double[N];
double[] imag = new double[N];
double[] mag = new double[N];
double[] phase = new double[N];
float[] new_array = new float[N];
// Zero Pad signal
for (int i = 0; i < N; i++) {
if (i < array.length) {
new_array[i] = array[i];
}
else {
new_array[i] = 0;
}
}
FFT fft = new FFT(N, 8000);
fft.forward(new_array);
fft_cpx = fft.getSpectrum();
tmpi = fft.getImaginaryPart();
tmpr = fft.getRealPart();
for (int i = 0; i < new_array.length; i++) {
real[i] = (double) tmpr[i];
imag[i] = (double) tmpi[i];
mag[i] = Math.sqrt((real[i] * real[i]) + (imag[i] * imag[i]));
phase[i] = Math.atan2(imag[i], real[i]);
/**** Reconstruction ****/
real_mod[i] = (float) (mag[i] * Math.cos(phase[i]));
imag_mod[i] = (float) (mag[i] * Math.sin(phase[i]));
}
fft.inverse(real_mod, imag_mod, res);
return res;
}
How then do I use this method to find the frequency (and then note) of sound recorded from the microphone?
Your goal is to take all magnitudes of individual frequencies in mag[i] and to find the largest one. For start, you can just loop over them and find the maximum mag[i]. Then you have to recalculate it's corresponding frequency from i index.
Frequency is determined by this equation:
freq = i * Fs / N;
Where Fs is sampling frequency of your time domain data (input wave data), N - number of samples you did compute FFT from. i is the index of your frequency domain data (computed magnitudes and phases)
In your case you can add line like into your for cycle to debug it:
double freq = (double)i*(double)fs/(double)N;
System.out.println("Frequency: "+ Double.toString(freq) + "Magnitude: "+ Double.toString(mag[i]));
Check this link for more information:
How to get frequency from fft result?
Nyquist theorem
... states that you can perfectly reconstruct frequencies only if you have twice the samples.... for reconstructing 1000Hz, you have to have at least 2000 samples per second. (Still this wave will be very distorted.).
If you have samplerate of 22000Hz, you would be able to somehow measure frequencies up to 11000Hz. Your data in mag and phase will be meaningful to the first half of array 0..N/2, then, you'll see just a mirror image of previous data (see the link to wikipedia page for a picture.)
If you want to determine your N check this answer or google more. Try to start with arbitrary numbers like one tenth of samplerate fs. The larger N, the slower will be your algorithm.
Table of note frequencies
Simplest way is to make a table of all frequencies you will detect and then just compare your frequency with maximum magnitude to all frequencie values in table. With a small tolerance, for example +-2% of the value in table. Be sure those tolerances do not overlap for two consecutive notes.
Microphone input
Google up keywords like java microphone input library tutorial, or check this answer.
I've modulated a carrier frequency signal with my data using FSK like this:
double SAMPLING_TIME = 1.0 / 441000 // 44khz
int SAMPLES_PER_BIT = 136;
int ENCODING_SAMPLES_PER_BIT = SAMPLES_PER_BIT / 2;
int duration = ENCODING_SAMPLES_PER_BIT * SAMPLING_TIME;
public double[] encode(int[] bits) {
for (int i = 0; i < bits.length; i++) {
int freq = FREQUENCY_LOW;
if (bits[i] > 1)
freq = FREQUENCY_HIGH;
bitArray = generateTone(freq, duration);
message = bitArray;
}
return message;
}
private double[] generateTone(int frequency, double duration) {
int samplingRate = 1/SAMPLING_TIME; // Hz
int numberOfSamples = (int) (duration * samplingRate);
samplingTime = 2 * SAMPLING_TIME;
double[] tone = new double[numberOfSamples];
for (int i = 0; i < numberOfSamples; i++) {
double y = Math.sin(2 * Math.PI * frequency * i * SAMPLING_TIME);
tone[i] = y * CARRIER_AMPLITUDE;
}
return tone;
}
Clearly, I'm sending FREQUENCY_LOW for ZERO and FREQUENCY_HIGH for 1.
Now how do I demodulate it using FFT? I'm interested in sampling magnitudes (presence and absence) of FREQUENCY_LOW, FREQUENCY_HIGH throughout the time.
I only know basics of FFT, I was starting to write this but it doesn't make sense:
private void decode(byte[] tone, int length) {
float[] input = new float[FFT_SIZE*2]; // not sure what size? shouldn't this be buffer?
for(int i=0;i<length;i++){
input[i]=tone[i];
}
FloatFFT_1D fft = new FloatFFT_1D(FFT_SIZE);
fft.realForward(input);
}
Can someone help with code?
You can use overlapping sliding windows for your FFTs, with the window and FFT the same length as that of your data bits. Then look for magnitude peaks for your 1's and 0's in the appropriate FFT result bins across these windows. You will also need some synchronization logic for runs of 1's and 0's.
Another DSP techniques that may be less compute intensive is to do quadrature demodulation for your two frequencies and low-pass filter the result before feeding it to the synchronization logic and bit detector. Yet another possibility is two sliding Goertzel filters.
I am currently trying to implement some code using Android to detect when a number of specific audio frequency ranges are played through the phone's microphone. I have set up the class using the AudioRecord class:
int channel_config = AudioFormat.CHANNEL_CONFIGURATION_MONO;
int format = AudioFormat.ENCODING_PCM_16BIT;
int sampleSize = 8000;
int bufferSize = AudioRecord.getMinBufferSize(sampleSize, channel_config, format);
AudioRecord audioInput = new AudioRecord(AudioSource.MIC, sampleSize, channel_config, format, bufferSize);
The audio is then read in:
short[] audioBuffer = new short[bufferSize];
audioInput.startRecording();
audioInput.read(audioBuffer, 0, bufferSize);
Performing an FFT is where I become stuck, as I have very little experience in this area. I have been trying to use this class:
FFT in Java and Complex class to go with it
I am then sending the following values:
Complex[] fftTempArray = new Complex[bufferSize];
for (int i=0; i<bufferSize; i++)
{
fftTempArray[i] = new Complex(audio[i], 0);
}
Complex[] fftArray = fft(fftTempArray);
This could easily be me misunderstanding how this class is meant to work, but the values returned jump all over the place and aren't representative of a consistent frequency even in silence. Is anyone aware of a way to perform this task, or am I overcomplicating matters to try and grab only a small number of frequency ranges rather than to draw it as a graphical representation?
First you need to ensure that the result you are getting is correctly converted to a float/double. I'm not sure how the short[] version works, but the byte[] version only returns the raw byte version. This byte array then needs to be properly converted to a floating point number. The code for the conversion should look something like this:
double[] micBufferData = new double[<insert-proper-size>];
final int bytesPerSample = 2; // As it is 16bit PCM
final double amplification = 100.0; // choose a number as you like
for (int index = 0, floatIndex = 0; index < bytesRecorded - bytesPerSample + 1; index += bytesPerSample, floatIndex++) {
double sample = 0;
for (int b = 0; b < bytesPerSample; b++) {
int v = bufferData[index + b];
if (b < bytesPerSample - 1 || bytesPerSample == 1) {
v &= 0xFF;
}
sample += v << (b * 8);
}
double sample32 = amplification * (sample / 32768.0);
micBufferData[floatIndex] = sample32;
}
Then you use micBufferData[] to create your input complex array.
Once you get the results, use the magnitudes of the complex numbers in the results. Most of the magnitudes should be close to zero except the frequencies that have actual values.
You need the sampling frequency to convert the array indices to such magnitudes to frequencies:
private double ComputeFrequency(int arrayIndex) {
return ((1.0 * sampleRate) / (1.0 * fftOutWindowSize)) * arrayIndex;
}