I was inspired by this question XOR Neural Network in Java
Briefly, a XOR neural network is trained and the number of iterations required to complete the training depends on seven parameters (alpha, gamma3_min_cutoff, gamma3_max_cutoff, gamma4_min_cutoff, gamma4_max_cutoff, gamma4_min_cutoff, gamma4_max_cutoff). I would like to minimize number of iterations required for training by tweaking these parameters.
So, I want to rewrite program from
private static double alpha=0.1, g3min=0.2, g3max=0.8;
int iteration= 0;
loop {
do_something;
iteration++;
if (error < threshold){break}
}
System.out.println( "iterations: " + iteration)
to
for (double alpha = 0.01; alpha < 10; alpha+=0.01){
for (double g3min = 0.01; g3min < 0.4; g3min += 0.01){
//Add five more loops to optimize other parameters
int iteration = 1;
loop {
do_something;
iteration++;
if (error < threshold){break}
}
System.out.println( inputs );
//number of iterations, alpha, cutoffs,etc
//Close five more loops here
}
}
But this brute forcing method is not going to be efficient. Given 7 parameters and hundreds of iterations for each calculation even with 10 points for each parameter translates in billions of operations. Nonlinear fit should do, but those typically require partial derivatives which I wouldn't have in this case.
Is there a Java package for this sort of optimizations?
Thank you in advance,
Stepan
You have some alternatives - depending on the equations that govern the error parameter.
Pick a point in parameter space and use an iterative process to walk towards a minimum. Essentially, add a delta to each parameter and pick whichever reduces the error by the most - rince - repeat.
Pick each pareameter and perform a binary-chop search between its limits to find it's minimum. Will only work if the parameter's effect is linear.
Solve the system using some form of Operations-Research technique to track down a minimum.
Related
This question already has answers here:
How do I write a correct micro-benchmark in Java?
(11 answers)
Closed 5 years ago.
I wanted to get some numbers on how fast is multiplication over addition. I wrote a simple code where I am multiplying 2 numbers, finding the time taken. Then I am adding the 2 numbers and find the time taken. The results are a bit disturbing. Before I show the results, here is the code
package com.np.fun;
import java.util.Scanner;
import org.apache.commons.lang3.time.StopWatch;
public class HowSlowIsMultiplication {
public static void main(String[] args) {
Scanner scanner = new Scanner(System.in);
int x = scanner.nextInt();
int y = scanner.nextInt();
long z;
StopWatch stopWatchMultipy = new StopWatch();
stopWatchMultipy.start();
z = x*y;
stopWatchMultipy.stop();
System.out.println("Time taken for multiplication is : " + stopWatchMultipy.getNanoTime());
StopWatch stopWatchAdd_1 = new StopWatch();
stopWatchAdd_1.start();
for(int i =0 ;i <Math.min(x, y); i++){
z = z + Math.max(x, y);
}
stopWatchAdd_1.stop();
System.out.println("Time taken for adding in less for loops is : " + stopWatchAdd_1.getNanoTime());
StopWatch stopWatchAdd_2 = new StopWatch();
stopWatchAdd_2.start();
for(int i =0 ;i <Math.max(x, y); i++){
z = z + Math.min(x, y);
}
stopWatchAdd_2.stop();
System.out.println("Time taken for adding in more for loops is : " + stopWatchAdd_1.getNanoTime());
}
}
I tried this with varying values of x & y. Here is the output for x=10000 and y=5000 (all times are in naoseconds)
Time taken for multiplication is : 61593
Time taken for adding in less for loops is : 1622599
Time taken for adding in more for loops is : 1622599
As you can see, multiplication is several orders of magnitude faster than addition.
Any reasons for this?
First off, it's not clear what you're trying to measure, as you appear to be comparing a single multiplication against doing multiple additions in a loop. You shouldn't be surprised that the later is slower.
That being said, microbenchmarks like this are essentially random noise. A multiplication is just one instruction at the machine code level. It's likely to be swamped by any sort of branch or function call you do to measure the timing. In order to get accurate measurements at that level, you have to use special assembly instructions. You also have to account for out of order execution and pipelining, since modern CPUs execute multiple instructions simultaneously, especially in simple cases like integer multiplication.
Add on top of that all the abstractions of the JVM and of Java, and the enterprise is even more hopeless. Java compiles to bytecode, which is in turn interpreted or JIT compiled by the JVM. The JVM has wide latitude to do whatever optimizations it sees fit, so even operations that take different amounts of bytecode instructions may have little or no relation to the ultimate performance.
I have a basic framework for a neural network to recognize numeric digits, but I'm having some problems with training it. My back-propogation works for small data sets, but when I have more than 50 data points, the return value starts converging to 0. And when I have data sets in the thousands, I get NaN's for costs and returns.
Basic structure: 3 layers: 784 : 15 : 1
784 is the number of pixels per data set, 15 neurons in my hidden layer, and one output neuron which returns a value from 0 to 1 (when you multiply by 10 you get a digit).
public class NetworkManager {
int inputSize;
int hiddenSize;
int outputSize;
public Matrix W1;
public Matrix W2;
public NetworkManager(int input, int hidden, int output) {
inputSize = input;
hiddenSize = hidden;
outputSize = output;
W1 = new Matrix(inputSize, hiddenSize);
W2 = new Matrix(hiddenSize, output);
}
Matrix z2, z3;
Matrix a2;
public Matrix forward(Matrix X) {
z2 = X.dot(W1);
a2 = sigmoid(z2);
z3 = a2.dot(W2);
Matrix yHat = sigmoid(z3);
return yHat;
}
public double costFunction(Matrix X, Matrix y) {
Matrix yHat = forward(X);
Matrix cost = yHat.sub(y);
cost = cost.mult(cost);
double returnValue = 0;
int i = 0;
while (i < cost.m.length) {
returnValue += cost.m[i][0];
i++;
}
return returnValue;
}
Matrix yHat;
public Matrix[] costFunctionPrime(Matrix X, Matrix y) {
yHat = forward(X);
Matrix delta3 = (yHat.sub(y)).mult(sigmoidPrime(z3));
Matrix dJdW2 = a2.t().dot(delta3);
Matrix delta2 = (delta3.dot(W2.t())).mult(sigmoidPrime(z2));
Matrix dJdW1 = X.t().dot(delta2);
return new Matrix[]{dJdW1, dJdW2};
}
}
There's the code for network framework. I pass double arrays of length 784 into the forward method.
int t = 0;
while (t < 10000) {
dJdW = Nn.costFunctionPrime(X, y);
Nn.W1 = Nn.W1.sub(dJdW[0].scalar(3));
Nn.W2 = Nn.W2.sub(dJdW[1].scalar(3));
t++;
}
I call this to adjust the weights. With small sets, the cost converges to 0 pretty well, but larger sets don't (the cost associated with 100 characters converges to 13, always). And if the set is too large, the first adjustment works (and costs go down) but after the second all I can get is NaN.
Why does this implementation fail with larger data sets (specifically training) and how can I fix this? I tried a similar structure with 10 outputs instead of 1 where each would return a value near 0 or 1 acting like boolean values, but the same thing was happening.
I'm also doing this in java by the way, and I'm wondering if that has something to do with the problem. I was wondering if it was a problem with running out of space but I haven't been getting any heap space messages. Is there a problem with how I'm back-propogating or is something else happening?
EDIT: I think I know what's happening. I think my backpropogation function is getting caught in local minimums. Sometimes the training succeeds and sometimes it fails for large data sets. Because I'm starting with random weights, I get random initial costs. What I've noticed is that when the cost initially exceeds a certain amount (it depends on the number of datasets involved), the costs converge to a clean number (sometimes 27, others 17.4) and the outputs converge to 0 (which makes sense).
I was warned about relative minimums in the cost function when I began, and I'm beginning to realize why. So now the question becomes, how do I go about my gradient descent so that I'll actually find the global minimum? I'm working in Java by the way.
This seems like a problem with weight initialization.
As far as i can see you never initialize the weights to any specific value. Therefore the network diverges. You should at least use random initialization.
If your backprop works on small dataset is there really good assumtion that there isn't problem. When you're suspicious about it you can try your BP on XOR problem.
Are units biased?
I once discuss with guy who doing exactly same thing. Hand digit recognition and 15 units in hidden layer. I saw a network who doing this task well. Her topology was:
Input: 784
First hidden: 500
Second hidden: 500
Third hidden: 2000
Output: 10
You have a sets of images and you nonlinear transform 784 pixels of image into the 15 numbers from <0, 1> interval and you doing this for all images of your set. You hope that you can right separate digit based on these 15 numbers. From my point of view is 15 hidden unit too little for such a task when I assumed you have dataset with thousands of example. Please try for example 500 hidden units.
And learning rate has influence on backprop and can caused problem with convergence.
1.I use IntelliJ IDEA build a maven project,code is as follows:
System.out.println("Load data....");
SentenceIterator iter = new LineSentenceIterator(new File("/home/zs/programs/deeplearning4j-master/dl4j-test-resources/src/main/resources/raw_sentences.txt"));
iter.setPreProcessor(new SentencePreProcessor() {
#Override
return sentence.toLowerCase();
}
});
System.out.println("Build model....");
int batchSize = 1000;
int iterations = 30;
int layerSize = 300;
com.sari.Word2Vec vec= new com.sari.Word2Vec.Builder()
.batchSize(batchSize) //# words per minibatch.
.sampling(1e-5) // negative sampling. drops words out
.minWordFrequency(5) //
.useAdaGrad(false) //
.layerSize(layerSize) // word feature vector size
.iterations(iterations) // # iterations to train
.learningRate(0.025) //
.minLearningRate(1e-2) // learning rate decays wrt # words. floor learning
.negativeSample(10) // sample size 10 words
.iterate(iter) //
.tokenizerFactory(tokenizer)
.build();
vec.fit();
System.out.println("Evaluate model....");
double cosSim = vec.similarity("day" , "night");
System.out.println("Similarity between day and night: "+cosSim);
This code is reference the word2vec in deeplearning4j,but the result is unstable.The results of each experiment were very different.for example, with the cosine value of the similarity between 'day'and 'night', sometimes the result is as high as 0.98, sometimes as low as 0.4?
Here are the results of two experiments
Evaluate model....
Similarity between day and night: 0.706292986869812
Evaluate model....
Similarity between day and night: 0.5550910234451294
Why the result like this.Because I have just started learning word2vec, there are a lot of knowledge is not understood, I hope that seniors can help me,thanks!
You have set the following line:
.minLearningRate(1e-2) // learning rate decays wrt # words. floor learning
But that is an extremely high learning rate. The high learning rate causes the model to not 'settle' in any state, but instead a few updates significantly changes the learned representation. That is not a problem during the first few updates, but bad for convergence.
Solution:
Allow learning rate to decay.
You can leave this line out completely, or if you must you can use a more appropriate value, such as 1e-15
I'm getting wrong frequency, I don't understand why i'm getting wrong values.since i have calculating as per instructions followed by stackoverflow.
I've used FFT from
http://introcs.cs.princeton.edu/java/97data/FFT.java.html
and complex from
http://introcs.cs.princeton.edu/java/97data/Complex.java.html
audioRec.startRecording();
audioRec.read(bufferByte, 0,bufferSize);
for(int i=0;i<bufferSize;i++){
bufferDouble[i]=(double)bufferByte[i];
}
Complex[] fftArray = new Complex[bufferSize];
for(int i=0;i<bufferSize;i++){
fftArray[i]=new Complex(bufferDouble[i],0);
}
FFT.fft(fftArray);
double[] magnitude=new double[bufferSize];
for(int i=0;i<bufferSize;i++){
magnitude[i] = Math.sqrt((fftArray[i].re()*fftArray[i].re()) + (fftArray[i].im()*fftArray[i].im()));
}
double max = 0.0;
int index = -1;
for(int j=0;j<bufferSize;j++){
if(max < magnitude[j]){
max = magnitude[j];
index = j;
}
}
final int peak=index * sampleRate/bufferSize;
Log.v(TAG2, "Peak Frequency = " + index * sampleRate/bufferSize);
handler.post(new Runnable() {
public void run() {
textView.append("---"+peak+"---");
}
});
i'm getting values like 21000,18976,40222,30283 etc...
Please help me.....
Thank you..
Your source code is almost fine. The only problem is that you search for the peaks through the full spectrum, i.e. from 0 via Fs/2 to Fs.
For any real-valued input signal (which you have) the spectrum between Fs/2 and Fs (=sample frequency) is an exact mirror of the spectrum between 0 and Fs/2 (I found this nice background explanation). Thus, for each frequency there exist two peaks with almost identical amplitude. I'm writing 'almost' because due to limited machine precision they are not necessarily exactly identical. So, you randomly find the peak in the first half of the spectrum which contains the frequencies below the Nyquist frequency (=Fs/2) or in the second half of the spectrum with the frequencies above the Nyquist frequency.
If you want to correct the mistake yourself, stop reading here. Otherwise continue:
Just replace
for(int j=0;j<bufferSize;j++){
with
for(int j=0;j<=bufferSize/2;j++){
in the source code you presented.
P.S.: Typically, it is better to apply a window function to the analysis buffer (e.g. a Hamming window) but for your application of peak picking it won't change results very much.
Problem: move an object along a straight line at a constant speed in the Cartesian coordinate system (x,y only). The rate of update is unstable. The movement speed must be close to exact and the object must arrive very close to the destination. The line's source and destination may be anywhere.
Given: the source and destination addresses (x0,x1,y0, y1), and a speed of arbitrary value.
An asside: There is an answer on the SO regarding this, and it's good, however it presumes that total time spend traveling is given.
Here's what I've got:
x0 = 127;
y0 = 127;
x1 = 257;
y1 = 188;
speed = 127;
ostrich.x=x0 //plus some distance along the line;
ostrich.y=y0 // plus some distance along the line;
//An arbitrarily large value so that each iteration increments the distance a minute amount
SPEED_VAR = 1000;
xDistPerIteration = (x1 - x0) / SPEED_VAR;
yDistPerIteration = (y1 - y0) / SPEED_VAR;
distanceToTravel = ;//Pythagorean theorum
limitX = limit1 = 0; //determines when to stop the while loop
//get called 40-60 times per second
void update(){
//Keep incrementing the ostrich' location
while (limitX < speed && limitY < speed) {
limitX += Math.abs(xDistPerIteration);
limitY += Math.abs(yDistPerIteration);
ostrich.x += xDistPerIteration;
ostrich.y += yDistPerIteration;
}
distanceTraveled -= Math.sqrt(Math.pow(limitX, 2) + Math.pow(limitY, 2));
if (distanceTraveled <=0)
//ostrich arrived safely at the factory
}
This code gets the job done, however it takes up exclusively 18% of program time in a CPU intensive program. It's garbage, programatically and in terms of performance. Any ideas on what to do here?
An asside: There is an answer on the
SO regarding this, and it's good,
however it presumes that total time
spend traveling is given.
basic physics to the rescue
total time spent traveling = distance/speed
btw Math.hypot(limitX,limitY) is faster than Math.sqrt(Math.pow(limitX, 2) + Math.pow(limitY, 2))
though really it's that while loop you should refactor out
One thing to improve is:
There is no need to compute the square root in each call to the update function. You may use the squared distanceTraveled instead.
Similarly, Math.abs(xDistPerIteration) and Math.abs(yDistPerIteration) do not change at each call to update, you may save those values and get rid of the calls to the absolute value function in order to bit a save a bit more computing time.
Update gets called 40-60 times per second, right? In other words, once per frame. So why is there a while loop inside it?
Also, doing sqrt once, and pow twice, per frame is unnecessary.
Just let d2 be the distance squared, and stop when limitX*limitX+limitY*limitY exceeds it.