Wrong values in calculating Frequency using FFT - java

I'm getting wrong frequency, I don't understand why i'm getting wrong values.since i have calculating as per instructions followed by stackoverflow.
I've used FFT from
http://introcs.cs.princeton.edu/java/97data/FFT.java.html
and complex from
http://introcs.cs.princeton.edu/java/97data/Complex.java.html
audioRec.startRecording();
audioRec.read(bufferByte, 0,bufferSize);
for(int i=0;i<bufferSize;i++){
bufferDouble[i]=(double)bufferByte[i];
}
Complex[] fftArray = new Complex[bufferSize];
for(int i=0;i<bufferSize;i++){
fftArray[i]=new Complex(bufferDouble[i],0);
}
FFT.fft(fftArray);
double[] magnitude=new double[bufferSize];
for(int i=0;i<bufferSize;i++){
magnitude[i] = Math.sqrt((fftArray[i].re()*fftArray[i].re()) + (fftArray[i].im()*fftArray[i].im()));
}
double max = 0.0;
int index = -1;
for(int j=0;j<bufferSize;j++){
if(max < magnitude[j]){
max = magnitude[j];
index = j;
}
}
final int peak=index * sampleRate/bufferSize;
Log.v(TAG2, "Peak Frequency = " + index * sampleRate/bufferSize);
handler.post(new Runnable() {
public void run() {
textView.append("---"+peak+"---");
}
});
i'm getting values like 21000,18976,40222,30283 etc...
Please help me.....
Thank you..

Your source code is almost fine. The only problem is that you search for the peaks through the full spectrum, i.e. from 0 via Fs/2 to Fs.
For any real-valued input signal (which you have) the spectrum between Fs/2 and Fs (=sample frequency) is an exact mirror of the spectrum between 0 and Fs/2 (I found this nice background explanation). Thus, for each frequency there exist two peaks with almost identical amplitude. I'm writing 'almost' because due to limited machine precision they are not necessarily exactly identical. So, you randomly find the peak in the first half of the spectrum which contains the frequencies below the Nyquist frequency (=Fs/2) or in the second half of the spectrum with the frequencies above the Nyquist frequency.
If you want to correct the mistake yourself, stop reading here. Otherwise continue:
Just replace
for(int j=0;j<bufferSize;j++){
with
for(int j=0;j<=bufferSize/2;j++){
in the source code you presented.
P.S.: Typically, it is better to apply a window function to the analysis buffer (e.g. a Hamming window) but for your application of peak picking it won't change results very much.

Related

Neural networks and large data sets

I have a basic framework for a neural network to recognize numeric digits, but I'm having some problems with training it. My back-propogation works for small data sets, but when I have more than 50 data points, the return value starts converging to 0. And when I have data sets in the thousands, I get NaN's for costs and returns.
Basic structure: 3 layers: 784 : 15 : 1
784 is the number of pixels per data set, 15 neurons in my hidden layer, and one output neuron which returns a value from 0 to 1 (when you multiply by 10 you get a digit).
public class NetworkManager {
int inputSize;
int hiddenSize;
int outputSize;
public Matrix W1;
public Matrix W2;
public NetworkManager(int input, int hidden, int output) {
inputSize = input;
hiddenSize = hidden;
outputSize = output;
W1 = new Matrix(inputSize, hiddenSize);
W2 = new Matrix(hiddenSize, output);
}
Matrix z2, z3;
Matrix a2;
public Matrix forward(Matrix X) {
z2 = X.dot(W1);
a2 = sigmoid(z2);
z3 = a2.dot(W2);
Matrix yHat = sigmoid(z3);
return yHat;
}
public double costFunction(Matrix X, Matrix y) {
Matrix yHat = forward(X);
Matrix cost = yHat.sub(y);
cost = cost.mult(cost);
double returnValue = 0;
int i = 0;
while (i < cost.m.length) {
returnValue += cost.m[i][0];
i++;
}
return returnValue;
}
Matrix yHat;
public Matrix[] costFunctionPrime(Matrix X, Matrix y) {
yHat = forward(X);
Matrix delta3 = (yHat.sub(y)).mult(sigmoidPrime(z3));
Matrix dJdW2 = a2.t().dot(delta3);
Matrix delta2 = (delta3.dot(W2.t())).mult(sigmoidPrime(z2));
Matrix dJdW1 = X.t().dot(delta2);
return new Matrix[]{dJdW1, dJdW2};
}
}
There's the code for network framework. I pass double arrays of length 784 into the forward method.
int t = 0;
while (t < 10000) {
dJdW = Nn.costFunctionPrime(X, y);
Nn.W1 = Nn.W1.sub(dJdW[0].scalar(3));
Nn.W2 = Nn.W2.sub(dJdW[1].scalar(3));
t++;
}
I call this to adjust the weights. With small sets, the cost converges to 0 pretty well, but larger sets don't (the cost associated with 100 characters converges to 13, always). And if the set is too large, the first adjustment works (and costs go down) but after the second all I can get is NaN.
Why does this implementation fail with larger data sets (specifically training) and how can I fix this? I tried a similar structure with 10 outputs instead of 1 where each would return a value near 0 or 1 acting like boolean values, but the same thing was happening.
I'm also doing this in java by the way, and I'm wondering if that has something to do with the problem. I was wondering if it was a problem with running out of space but I haven't been getting any heap space messages. Is there a problem with how I'm back-propogating or is something else happening?
EDIT: I think I know what's happening. I think my backpropogation function is getting caught in local minimums. Sometimes the training succeeds and sometimes it fails for large data sets. Because I'm starting with random weights, I get random initial costs. What I've noticed is that when the cost initially exceeds a certain amount (it depends on the number of datasets involved), the costs converge to a clean number (sometimes 27, others 17.4) and the outputs converge to 0 (which makes sense).
I was warned about relative minimums in the cost function when I began, and I'm beginning to realize why. So now the question becomes, how do I go about my gradient descent so that I'll actually find the global minimum? I'm working in Java by the way.
This seems like a problem with weight initialization.
As far as i can see you never initialize the weights to any specific value. Therefore the network diverges. You should at least use random initialization.
If your backprop works on small dataset is there really good assumtion that there isn't problem. When you're suspicious about it you can try your BP on XOR problem.
Are units biased?
I once discuss with guy who doing exactly same thing. Hand digit recognition and 15 units in hidden layer. I saw a network who doing this task well. Her topology was:
Input: 784
First hidden: 500
Second hidden: 500
Third hidden: 2000
Output: 10
You have a sets of images and you nonlinear transform 784 pixels of image into the 15 numbers from <0, 1> interval and you doing this for all images of your set. You hope that you can right separate digit based on these 15 numbers. From my point of view is 15 hidden unit too little for such a task when I assumed you have dataset with thousands of example. Please try for example 500 hidden units.
And learning rate has influence on backprop and can caused problem with convergence.

K-means|| aka K-Means++ Scalable - Problems with implementation

EDIT: Code updated, Comments, performance info
I'm trying to code K-means|| in Java. (http://vldb.org/pvldb/vol5/p622_bahmanbahmani_vldb2012.pdf)
However, it doesn't work well. I wasn't surprised that the running time increases compared to standard K-means. I'm much more wondering why the detection rate of my program trained with K-means|| is lower compared to a training with standard K-means. How could choosen clusterpoints be worse than ones picked by chance?
UPDATE: If've found some errors while my internet was down, k-means|| performs now as good as k-means standard - however not a bit better.
I'm quite sure that my code is wrong, but after hours of searching I've got no idea where I've done a mistake (frankly, I'm quite new to this topic).
So I hope you see what I've done wrong. Here is my code for both seeding options:
public void training(int stop, int numberIt, double epsilon, boolean advanced){
double d=Double.MAX_VALUE,s=0;
int nearestprototype=0;
int [] myprototype=new int[trainingsSet.size()];
Random random=new Random();
//
long t1=System.currentTimeMillis();
if(!advanced){//standard random k-means seeding; random datapoints are choosen as prototypes
for(int i=0; i<k; i++){
int rand = random.nextInt(trainingsSet.size());
prototypes[i]=trainingsSet.getVectorAtIndex(rand);
}
}else{ //state-of-the-art k-means|| a.k.a k-means++ scalable seeding; explanation here: http://vldb.org/pvldb/vol5/p622_bahmanbahmani_vldb2012.pdf
prototypes[0]=trainingsSet.getVectorAtIndex(random.nextInt(trainingsSet.size())); //first protoype, chosen randomly
Vector<DataVector>kproto=new Vector<DataVector>(); //saves the prototypes
kproto.add(prototypes[0]);
for(int i=0;i<trainingsSet.size();i++){ //gets distance to all data points, sum it up
s+=trainingsSet.getVectorAtIndex(i).distance2(kproto.elementAt(0));
}
double it=Math.floor(Math.log(s)); // calculates how often the loop for step 4 and 5 is executed
for(int c=0; c<it; c++){
int[]psi=new int[trainingsSet.size()];//saves minimum distance to a protoype for every element
for(int i=0; i<trainingsSet.size();i++){
double min=Double.POSITIVE_INFINITY;
for(int j=0;j<kproto.size();j++){
double dist=trainingsSet.getVectorAtIndex(i).distance2(kproto.elementAt(j));
if(min>dist){
min=dist;
}
}
psi[i]=(int) min;
}
double phi_c=0;
for(int i=0; i<trainingsSet.size();i++)
phi_c+=psi[i]; //sums up squared distances
for(int f=0; f<trainingsSet.size();f++){
double p_x=5*psi[f]/phi_c; //oversampling factor 0.1*k (k is 50 in my case)
if(p_x>random.nextDouble()){
kproto.addElement(trainingsSet.getVectorAtIndex(f));//adds data point to the prototype set with a probability
//depending on its distance to the next prototype
}
}
}
int[]w=new int[kproto.size()]; //every prototype gets a value in w; the value is increased if the prototype has a minimum distance to a data point
for(int i=0; i<trainingsSet.size();i++){
double min=trainingsSet.getVectorAtIndex(i).distance2(kproto.elementAt(0));
if(min==0)
continue;
int index=0;
for(int j=1; j<kproto.size();j++){
double save=trainingsSet.getVectorAtIndex(i).distance2(kproto.elementAt(j));
if(min>save){
min=save;
index=j;
}
}
w[index]++;
}
int[]wtotal=new int[kproto.size()]; //wtotal sums the w values up
for(int i=0;i<kproto.size();i++){
for(int st=0; st<=i;st++){
wtotal[i]+=w[st];
}
}
int[]cselect=new int[k];//cselect saves the final prototypes
int stoppoint=0;
boolean repeat=false; //repeat lets the choosing process repeat if the prototype has already been selected
for(int kk=0;kk<k;kk++){
do{
repeat=false;
int stopper=random.nextInt(wtotal[kproto.size()-1]);//randomly choose a int and check in which interval it lies
for(int st=wtotal.length-1;st>=0;st--){
if(stopper>=wtotal[st]){
stoppoint=wtotal.length-st-1;
break;
}
}
for(int i=0; i<kk;i++){
if(cselect[i]==stoppoint)
repeat=true;
}
}while(repeat);
//are all prototypes overwritten?
prototypes[kk]=kproto.get(stoppoint);//the number of the interval is connected to a prototype; the prototype is added to the final set of prototypes "prototypes"
cselect[kk]=stoppoint;
}
}
long t2=System.currentTimeMillis();
System.out.println(advanced+" Init time: "+(t2-t1));
The performance shows that both options (standard, k-means||) reach the same level of correct clustering (around 85%). However, the running time for initalisation differs.
The seeding is quasi-immediatly for standard k-means, whereas k-means|| needs 600-900ms (for 1000 data points). The convergence afterwards with standard maximazation/expectation needs the same time for both (around 1900-2500ms). This is irritation because k-means|| should converge much faster.
I hope you spot some error or maybe explain me if I expect something else than k-means|| can deliver.
Thanks for your help!

Minimization of number of itererations by adjusting input parameters in Java

I was inspired by this question XOR Neural Network in Java
Briefly, a XOR neural network is trained and the number of iterations required to complete the training depends on seven parameters (alpha, gamma3_min_cutoff, gamma3_max_cutoff, gamma4_min_cutoff, gamma4_max_cutoff, gamma4_min_cutoff, gamma4_max_cutoff). I would like to minimize number of iterations required for training by tweaking these parameters.
So, I want to rewrite program from
private static double alpha=0.1, g3min=0.2, g3max=0.8;
int iteration= 0;
loop {
do_something;
iteration++;
if (error < threshold){break}
}
System.out.println( "iterations: " + iteration)
to
for (double alpha = 0.01; alpha < 10; alpha+=0.01){
for (double g3min = 0.01; g3min < 0.4; g3min += 0.01){
//Add five more loops to optimize other parameters
int iteration = 1;
loop {
do_something;
iteration++;
if (error < threshold){break}
}
System.out.println( inputs );
//number of iterations, alpha, cutoffs,etc
//Close five more loops here
}
}
But this brute forcing method is not going to be efficient. Given 7 parameters and hundreds of iterations for each calculation even with 10 points for each parameter translates in billions of operations. Nonlinear fit should do, but those typically require partial derivatives which I wouldn't have in this case.
Is there a Java package for this sort of optimizations?
Thank you in advance,
Stepan
You have some alternatives - depending on the equations that govern the error parameter.
Pick a point in parameter space and use an iterative process to walk towards a minimum. Essentially, add a delta to each parameter and pick whichever reduces the error by the most - rince - repeat.
Pick each pareameter and perform a binary-chop search between its limits to find it's minimum. Will only work if the parameter's effect is linear.
Solve the system using some form of Operations-Research technique to track down a minimum.

Echo/delay algorithm just causes noise/static?

There have been other questions and answers on this site suggesting that, to create an echo or delay effect, you need only add one audio sample with a stored audio sample from the past. As such, I have the following Java class:
public class DelayAMod extends AudioMod {
private int delay = 500;
private float decay = 0.1f;
private boolean feedback = false;
private int delaySamples;
private short[] samples;
private int rrPointer;
#Override
public void init() {
this.setDelay(this.delay);
this.samples = new short[44100];
this.rrPointer = 0;
}
public void setDecay(final float decay) {
this.decay = Math.max(0.0f, Math.min(decay, 0.99f));
}
public void setDelay(final int msDelay) {
this.delay = msDelay;
this.delaySamples = 44100 / (1000/this.delay);
System.out.println("Delay samples:"+this.delaySamples);
}
#Override
public short process(short sample) {
System.out.println("Got:"+sample);
if (this.feedback) {
//Delay should feed back into the loop:
sample = (this.samples[this.rrPointer] = this.apply(sample));
} else {
//No feedback - store base data, then add echo:
this.samples[this.rrPointer] = sample;
sample = this.apply(sample);
}
++this.rrPointer;
if (this.rrPointer >= this.samples.length) {
this.rrPointer = 0;
}
System.out.println("Returning:"+sample);
return sample;
}
private short apply(short sample) {
int loc = this.rrPointer - this.delaySamples;
if (loc < 0) {
loc += this.samples.length;
}
System.out.println("Found:"+this.samples[loc]+" at "+loc);
System.out.println("Adding:"+(this.samples[loc] * this.decay));
return (short)Math.max(Short.MIN_VALUE, Math.min(sample + (int)(this.samples[loc] * this.decay), (int)Short.MAX_VALUE));
}
}
It accepts one 16-bit sample at a time from an input stream, finds an earlier sample, and adds them together accordingly. However, the output is just horrible noisy static, especially when the decay is raised to a level that would actually cause any appreciable result. Reducing the decay to 0.01 barely allows the original audio to come through, but there's certainly no echo at that point.
Basic troubleshooting facts:
The audio stream sounds fine if this processing is skipped.
The audio stream sounds fine if decay is 0 (nothing to add).
The stored samples are indeed stored and accessed in the proper order and the proper locations.
The stored samples are being decayed and added to the input samples properly.
All numbers from the call of process() to return sample are precisely what I would expect from this algorithm, and remain so even outside this class.
The problem seems to arise from simply adding signed shorts together, and the resulting waveform is an absolute catastrophe. I've seen this specific method implemented in a variety of places - C#, C++, even on microcontrollers - so why is it failing so hard here?
EDIT: It seems I've been going about this entirely wrong. I don't know if it's FFmpeg/avconv, or some other factor, but I am not working with a normal PCM signal here. Through graphing of the waveform, as well as a failed attempt at a tone generator and the resulting analysis, I have determined that this is some version of differential pulse-code modulation; pitch is determined by change from one sample to the next, and halving the intended "volume" multiplier on a pure sine wave actually lowers the pitch and leaves volume the same. (Messing with the volume multiplier on a non-sine sequence creates the same static as this echo algorithm.) As this and other DSP algorithms are intended to work on linear pulse-code modulation, I'm going to need some way to get the proper audio stream first.
It should definitely work unless you have significant clipping.
For example, this is a text file with two columns. The leftmost column is the 16 bit input. The second column is the sum of the first and a version delayed by 4001 samples. The sample rate is 22KHz.
Each sample in the second column is the result of summing x[k] and x[k-4001] (e.g. y[5000] = x[5000] + x[999] = -13840 + 9181 = -4659) You can clearly hear the echo signal when playing the samples in the second column.
Try this signal with your code and see if you get identical results.

Java algorithm for normalizing audio

I'm trying to normalize an audio file of speech.
Specifically, where an audio file contains peaks in volume, I'm trying to level it out, so the quiet sections are louder, and the peaks are quieter.
I know very little about audio manipulation, beyond what I've learnt from working on this task. Also, my math is embarrassingly weak.
I've done some research, and the Xuggle site provides a sample which shows reducing the volume using the following code: (full version here)
#Override
public void onAudioSamples(IAudioSamplesEvent event)
{
// get the raw audio byes and adjust it's value
ShortBuffer buffer = event.getAudioSamples().getByteBuffer().asShortBuffer();
for (int i = 0; i < buffer.limit(); ++i)
buffer.put(i, (short)(buffer.get(i) * mVolume));
super.onAudioSamples(event);
}
Here, they modify the bytes in getAudioSamples() by a constant of mVolume.
Building on this approach, I've attempted a normalisation modifies the bytes in getAudioSamples() to a normalised value, considering the max/min in the file. (See below for details). I have a simple filter to leave "silence" alone (ie., anything below a value).
I'm finding that the output file is very noisy (ie., the quality is seriously degraded). I assume that the error is either in my normalisation algorithim, or the way I manipulate the bytes. However, I'm unsure of where to go next.
Here's an abridged version of what I'm currently doing.
Step 1: Find peaks in file:
Reads the full audio file, and finds this highest and lowest values of buffer.get() for all AudioSamples
#Override
public void onAudioSamples(IAudioSamplesEvent event) {
IAudioSamples audioSamples = event.getAudioSamples();
ShortBuffer buffer =
audioSamples.getByteBuffer().asShortBuffer();
short min = Short.MAX_VALUE;
short max = Short.MIN_VALUE;
for (int i = 0; i < buffer.limit(); ++i) {
short value = buffer.get(i);
min = (short) Math.min(min, value);
max = (short) Math.max(max, value);
}
// assign of min/max ommitted for brevity.
super.onAudioSamples(event);
}
Step 2: Normalize all values:
In a loop similar to step1, replace the buffer with normalized values, calling:
buffer.put(i, normalize(buffer.get(i));
public short normalize(short value) {
if (isBackgroundNoise(value))
return value;
short rawMin = // min from step1
short rawMax = // max from step1
short targetRangeMin = 1000;
short targetRangeMax = 8000;
int abs = Math.abs(value);
double a = (abs - rawMin) * (targetRangeMax - targetRangeMin);
double b = (rawMax - rawMin);
double result = targetRangeMin + ( a/b );
// Copy the sign of value to result.
result = Math.copySign(result,value);
return (short) result;
}
Questions:
Is this a valid approach for attempting to normalize an audio file?
Is my math in normalize() valid?
Why would this cause the file to become noisy, where a similar approach in the demo code doesn't?
I don't think the concept of "minimum sample value" is very meaningful, since the sample value just represents the current "height" of the sound wave at a certain time instant. I.e. its absolute value will vary between the peak value of the audio clip and zero. Thus, having a targetRangeMin seems to be wrong and will probably cause some distortion of the waveform.
I think a better approach might be to have some sort of weight function that decreases the sample value based on its size. I.e. bigger values are decreased by a large percentage than smaller values. This would also introduce some distortion, but probably not very noticeable.
Edit: here is a sample implementation of such a method:
public short normalize(short value) {
short rawMax = // max from step1
short targetMax = 8000;
//This is the maximum volume reduction
double maxReduce = 1 - targetMax/(double)rawMax;
int abs = Math.abs(value);
double factor = (maxReduce * abs/(double)rawMax);
return (short) Math.round((1 - factor) * value);
}
For reference, this is what your algorithm did to a sine curve with an amplitude of 10000:
This explains why the audio quality becomes much worse after being normalized.
This is the result after running with my suggested normalize method:
"normalization" of audio is the process of increasing the level of the audio such that the maximum is equal to some given value, usually the maximum possible value. Today, in another question, someone explained how to do this (see #1): audio volume normalization
However, you go on to say "Specifically, where an audio file contains peaks in volume, I'm trying to level it out, so the quiet sections are louder, and the peaks are quieter." This is called "compression" or "limiting" (not to be confused with the type of compression such as that used in encoding MP3s!). You can read more about that here: http://en.wikipedia.org/wiki/Dynamic_range_compression
A simple compressor is not particularly hard to implement, but you say your math "is embarrassingly weak." So you might want to find one that's already built. You might be able to find a compressor implemented in http://sox.sourceforge.net/ and convert that from C to Java. The only java implementation of compressor I know of who's source is available (and it's not very good) is in this book
As an alternative to solve your problem, you might be able to normalize your file in segments of say 1/2 a second each, and then connect the gain values you use for each segment using linear interpolation. You can read about linear interpolation for audio here: http://blog.bjornroche.com/2010/10/linear-interpolation-for-audio-in-c-c.html
I don't know if the source code is available for the levelator, but that's something else you can try.

Categories

Resources