When do you implement the Sigmoid function in a neural network? - java

I am getting into some neural networks because it seemed fun. I translated the python code to java and it works like it should I think. It gives me the correct values every time. Although I want to know where do you implement the Sigmoid function in the code. I implemented it after I calculated the output, but even without the Sigmoid function it works the same way.
Website I learned from: https://towardsdatascience.com/first-neural-network-for-beginners-explained-with-code-4cfd37e06eaf
This is my Perceptron function:
public void Perceptron(int input1,int input2,int output) {
double outputP = input1*weights[0]+input2*weights[1]+bias*weights[2];
outputP = Math.floor((1/(1+Math.exp(-outputP))));
if(outputP > 0 ) {
outputP = 1;
}else {
outputP = 0;
}
double error = output - outputP;
weights[0] += error * input1 * learningRate;
weights[1] += error * input2 * learningRate;
weights[2] += error * bias * learningRate;
System.out.println("Output:" + outputP);
}
Also if I don't add the Math.floor() it just gives me a lot of decimals.

Not an expert, but it is used instead of your conditional where you output 1 or 0. That's your threshold function. In that case, you are using a step function; you could replace the whole conditional with your sigmoid function.

Related

How come my program freezes up when I click calculate?

Here is my code:
private void btnCalculateActionPerformed(java.awt.event.ActionEvent evt) {
int intInitialInvest = Integer.parseInt(this.txtInputInitialInvest.getText());
int intAnnualInterest = Integer.parseInt(this.txtInputAnnualInterest.getText());
int intEndingValue = Integer.parseInt(this.txtInputEndingValue.getText());
double dblAnnualPercent = intAnnualInterest/100;
int count = 0;
while (intInitialInvest < intEndingValue){
intInitialInvest += (intInitialInvest * dblAnnualPercent);
count += 1;
}
this.lblOutputYears.setText("The number of years required is " + count);
}
This program is supposed to calculate how many years (which is count) it takes for example for a cd with a value of $2000 to become $5000 with an annual interest rate of 8%. This should then return 12. What I did was create a while loop which runs until the $2000 turn into $5000 or more from interest which is expressed by intInitialinvest += (intInitialInvest * dblAnnualPercent);
Every time I run the program by clicking the "Calculate" button, the program freezes and doesn't do anything then I have to go into task manager to close it.
Be careful with integer divisions:
double dblAnnualPercent = intAnnualInterest/100;
causes the value of dblAnnualPercent to be 0.0, and thus you run into an infinite loop. You perform an integer division (e.g 8/100=0) then convert to double (0.0, not 0.05 as you would have expected).
double dblAnnualPercent = intAnnualInterest/100.;
should fix your bug.
Hint: add assertions, run your problem with assertions enabled.
assert(dblAnnualPercent > 0.);
would have saved you (assuming you run your program with -ea).
But also try to solve your problem without a loop. There is a closed form solution to your problem, using math instead of loops... that solution is one line only.
If intInitialInvest=0 or dblAnnualPercent=0 and intEndingValue > 0 you'll loop forever.
while (intInitialInvest < intEndingValue){
intInitialInvest += (intInitialInvest * dblAnnualPercent);
count += 1;
}
You have to test your values before you enter the loop, especially as you seem to read these values from some input. This is a possible attack vector, even when you assert on these values, as your algorithm breaks, when someone feeds input that makes intInitialInvest=0 or intAnnualInterest<100.

(Java) Partial Derivatives for Back Propagation of Hidden Layer

Yesterday I posted a question about the first piece of the Back propagation aglorithm.
Today I'm working to understand the hidden layer.
Sorry for a lot of questions, I've read several websites and papers on the subject, but no matter how much I read, I still have a hard time applying it to actual code.
This is the code that I'm analyzing (I'm working in Java so its nice to look at a Java example)
// update weights for the hidden layer
for (Neuron n : hiddenLayer) {
ArrayList<Connection> connections = n.getAllInConnections();
for (Connection con : connections) {
double output = n.getOutput();
double ai = con.leftNeuron.getOutput();
double sumKoutputs = 0;
int j = 0;
for (Neuron out_neu : outputLayer) {
double wjk = out_neu.getConnection(n.id).getWeight();
double desiredOutput = (double) expectedOutput[j];
double ak = out_neu.getOutput();
j++;
sumKoutputs = sumKoutputs
+ (-(desiredOutput - ak) * ak * (1 - ak) * wjk);
}
double partialDerivative = output * (1 - output) * ai * sumKoutputs;
double deltaWeight = -learningRate * partialDerivative;
double newWeight = con.getWeight() + deltaWeight;
con.setDeltaWeight(deltaWeight);
con.setWeight(newWeight + momentum * con.getPrevDeltaWeight());
}
}
One real problem, here, is that I don't know how all of the methods work exactly.
This code is going through all neurons in the hidden layer, and going through each connection to each neuron in the hidden layer one by one. It grabs each of the connection's output? So, this is the summation of incoming connections (run through a Sig function probably) and then * by a connection weight? Then "double ai" is getting the input connection values to this particular node? Is it getting just one or the sum of the input to the neuron?
Then a third for loop pretty much sums up a "out_neu.getConnection(n.id).getWeight()" which I don't quite understand. Then, the desired output is the desiredOutput for the final layer node? Then ak is the actual output (summation and activation function) of each node or is it the summation+activation*weight?
EDIT
I started working on my own code, can anyone take a look at it?
public class BackProp {
public int layers = 3;
public int hiddenNeuronsNum = 5;
public int outputNeuronsNum = 1;
public static final double eta = .1;
public double[][][] weights; //holds the network -- weights[layer][neuron][forwardConnetion]
public void Back(){
for(int neuron = 0; neuron < outputNeuronsNum; neuron++){
for(int connection = 0; connection < hiddenNeuronsNum; connection++){
double expOutput = expectedOutput[neuron]; //the expected output from the neuron we're on
double actOutput = actualOutput[neuron];
double previousLayerOutput = holdNeuronValues[layers-1][neuron];
double delta = eta *(actOutput * (1-actOutput) *(expOutput - actOutput)* previousLayerOutput);
weights[layers-1][neuron][connection] += delta; //OKAY M&M said YOU HAD THIS MESSED UP, 3rd index means end neuron, 2nd means start.. moving from left to right
}
}
//Hidden Layer..
for(int neuron = 0; neuron < outputNeuronsNum; neuron++){
for(int connection = 0; connection < hiddenNeuronsNum; connection++){
double input = holdNeuronValues[layers-3][connection]; //what this neuron sends on, -2 for the next layer
double output = holdNeuronValues[layers-2][connection];
double sumKoutputs = 0;
//for the output layer
for (int outputNeurons = 0; outputNeurons < weights[layers].length; outputNeurons++) {
double wjk = weights[layers-2][neuron][outputNeurons]; //get the weight
double expOutput = expectedOutput[outputNeurons];
double out = actualOutput[outputNeurons];
sumKoutputs += (-(expOutput - out) * wjk);
}
double partialDerivative = -eta * output * (1 - output) * input * sumKoutputs;
}
}
}
}
This is the standard backpropagation algorithm where it is backpropagating the error through all the hidden layers.
Unless we are in the output layer, the error for a neuron in a hidden layer is dependent on the succeeding layer. Let's assume that we have a particular neuron a with synapses that connect it to neurons i, j, and k in the next layer. Let us also assume that the output of neuron a is oa. Then the error for neuron a is equal to the following expression (assuming we are using the logistic function as the activation function):
δa = oa(1 - oa) × (δiwai + δjwaj + δkwak)
Here, oa(1 - oa) is the value of the derivative of the activation function. δi is the error of neuron i and wai is the weight assigned to the synapse (connection) from i to a; the same applies to the remaining terms.
Notice how we are taking into account the error for each neuron in the next layer that a is connected to. Also notice that we are taking into account the weight accorded to each synapse. Without going into the math, it makes sense intuitively that the error for a is not only dependent on the errors on the neuron that a connects to, but is also dependent on the weights of the synapses (connections) between a and neurons in the next layer.
Once we have the errors, we need to update the weights of the synapses (connections) of every neuron in the previous layer that connects to a (i.e., we backpropagate the error). Let us assume that we have a single neuron z that connects to a. Then we have to adjust wza as follows:
wza = wza + (α × δa × oz)
If there are other neurons (and there probably are) in the previous layer that connect to a, we will update their weights using the same formula as well. Now if you look at your code, you will see that this is exactly what is happening.
You are doing the following for each neuron in the hidden layer:
You are getting a list of synapses (connections) that connect this neuron to the previous layer. This is the connections = n.getAllInConnections() part.
For each connection, the code then does the following:
It gets the output of the neuron (this is the oa term i the formulas above).
It gets the output of the neuron that connects to this neuron (this is the oz term).
Then for each neuron in the output layer, it calculates the sum of the error of each output neuron times the weight from our neuron in the hidden layer, to a neuron in the output layer. Here, sumKoutputs is the same as what we are doing in the expression (δiwai + δjwaj + δkwak). The value of the δi comes from -(desiredOutput - ak) * ak * (1 - ak), since this is how you calculate the error of the output layer; you can simply multiply the derivative of the activation function for the output-layer neuron to the difference between the actual and expected output. Finally, you can see that we multiply that whole thing by wjk; this is the same as the wai term in our formula.
We now have all the values we need to plug into our formula to adjust the weights for every synapse that connects to our neuron from the preceding layer. The problem with the code is that it calculates some things a little differently:
In our formula we have oa(1 - oa) × (δiwai + δjwaj + δkwak) for the error for neuron a. But in the code, it calculates partialDerivative by including ai. In our terms, this would be equivalent to oa(1 - oa) × oz × (δiwai + δjwaj + δkwak). Mathematically it works out because later we end up multiplying this to the learning rate anyway (α × δa × oz), and so it is exactly the same; the difference is just that the code performs the multiplication to oz earlier.
It then calculates deltaWeight, which is (α × δa × oz) in our formula. In the code, α is learningRate.
We then update the weight by adding the delta to the current weight. This is the same as wza + (α × δa × oz).
Now things are a little different. You can see that the code doesn't set the weight directly, but instead deals with momentum. You can see that by using momentum, we add a fraction of the previous delta to the new weight. This is a technique used in neural networks to ensure that the network doesn't get stuck in a local minima. The momentum term gives us a little "push" to get out of a local minima (a "well" in the error-surface; with a neural network we are traversing the error surface to find one with the lowest error, but we could get stuck in a "well" that isn't as "deep" as the optimal solution), and ensures that we can "converge" on a solution. But you have to be careful because if you set this too high, you can overshoot your optimal solution. Once it calculates the new weight using the momentum, it sets it for the connection (synapse).
I hope this explanation made it clearer for you. The math is a little hard to get into, but once you figure it out, it makes sense. I think the main problem here is that the code is written in a slightly different manner. You can take a look at some code here that I wrote that implements the backpropagation algorithm; I did this as a class project. It runs pretty much along the same lines as the formulas I described above and so you should be able to follow through it easily. You can also take a look at this video I made where I explain the backpropagation algorithm.

Java: calculating velocity of a skydiver

In Java, I am trying to implement the following equation for calculating the current velocity of a skydiver not neglecting air resistance.
v(t) = v(t-∆t) + (g - [(drag x crossArea x airDensity) / (2*mass)] *
v[(t-∆t)^2] ) * (∆t)
My problem is that I am not sure how to translate "v(t - ∆t)" into a code. Right now I have this method below, where as you can see I am using the method within itself to find the previous velocity. This has continued to result in a stack overflow error message, understandably.
(timeStep = ∆t)
public double calculateVelocity(double time){
double velocity;
velocity = calculateVelocity(time - timeStep)
+ (acceleration - ((drag * crossArea * airDensity)
/ (2 * massOfPerson))
* (calculateVelocity(time - timeStep)*(time * timeStep)))
* timeStep;
}
return velocity;
}
I am calling the above method in the method below. Assuming that the ending time = an int, will be the user input but written this way to be dynamic.
public void assignVelocitytoArrays(){
double currentTime = 0;
while(currentTime <= endingTime){
this.vFinal = calculateVelocity(currentTime);
currentTime += timeStep;
}
}
I would like to figure this out on my own, could someone give me a general direction? Is using a method within itself the right idea or am I completely off track?
The formula you want to implement is the recursive representation of a sequence, mathematiacally speaking.
Recursive sequences need a starting point, e.g.
v(0) = 0 (because a negative time does not make sense)
and a rule to calculate the next elements, e.g.
v(t) = v(t-∆t) + (g - [(drag x crossArea x airDensity) / (2*mass)] * v[(t-∆t)^2] ) * (∆t)
(btw: are you sure it has to be v([t-∆t]^2) instead of v([t-∆t])^2?)
So your approach to use recursion (calling a function within itself) to calculate a recursive sequence is correct.
In your implementation, you only forgot one detail: the starting point. How should your program know that v(0) is not defined be the rule, but by a definite value? So you must include it:
if(input value == starting point){
return starting point
}
else{
follow the rule
}
On a side note: you seem to be creating an ascending array of velocities. It would make sense to use the already calculated values in the array instead of recursion, so you don't have to calculate every step again and again.
This only works if you did indeed make a mistake in the rule.
double[] v = new double[maxTime/timeStep];
v[0] = 0; //starting point
for(int t = 1; t < maxSteps; t++){
v[t] = v[t-1] + (g - [(drag x crossArea x airDensity) / (2*mass)] * v[t-1]^2 ) * (∆t)
}

Java - Computation of Derivations with Apache Commons Mathematic Library

I have a problem in using the apache commons math library.
I just want to create functions like f(x) = 4x^2 + 2x and I want to compute the derivative of this function --> f'(x) = 8x + 2
I read the article about Differentiation (http://commons.apache.org/proper/commons-math/userguide/analysis.html, section 4.7).
There is an example which I don't understand:
int params = 1;
int order = 3;
double xRealValue = 2.5;
DerivativeStructure x = new DerivativeStructure(params, order, 0, xRealValue);
DerivativeStructure y = f(x); //COMPILE ERROR
System.out.println("y = " + y.getValue();
System.out.println("y' = " + y.getPartialDerivative(1);
System.out.println("y'' = " + y.getPartialDerivative(2);
System.out.println("y''' = " + y.getPartialDerivative(3);
In Line 5 a compile error occurs of course. The function f(x) is called and not defined. What I am getting wrong?
Has anyone any experience with the differentiation/derivation with the apache commons math library or does anyone know another library/framework which can help me?
Thanks
In the paragraph below that example, the author describes ways to create DerivativeStructures. It isn't magic. In the example you quoted, someone was supposed to write the function f. Well, that wasn't very clear.
There are several ways a user can create an implementation of the UnivariateDifferentiableFunction interface. The first method is to simply write it directly using the appropriate methods from DerivativeStructure to compute addition, subtraction, sine, cosine... This is often quite straigthforward and there is no need to remember the rules for differentiation: the user code only represent the function itself, the differentials will be computed automatically under the hood. The second method is to write a classical UnivariateFunction and to pass it to an existing implementation of the UnivariateFunctionDifferentiator interface to retrieve a differentiated version of the same function. The first method is more suited to small functions for which user already control all the underlying code. The second method is more suited to either large functions that would be cumbersome to write using the DerivativeStructure API, or functions for which user does not have control to the full underlying code (for example functions that call external libraries).
Use the first idea.
// Function of 1 variable, keep track of 3 derivatives with respect to that variable,
// use 2.5 as the current value. Basically, the identity function.
DerivativeStructure x = new DerivativeStructure(1, 3, 0, 2.5);
// Basically, x --> x^2.
DerivativeStructure x2 = x.pow(2);
//Linear combination: y = 4x^2 + 2x
DerivativeStructure y = new DerivativeStructure(4.0, x2, 2.0, x);
System.out.println("y = " + y.getValue());
System.out.println("y' = " + y.getPartialDerivative(1));
System.out.println("y'' = " + y.getPartialDerivative(2));
System.out.println("y''' = " + y.getPartialDerivative(3));
The following thread from the Apache mailing list seems to illustrate the two possible ways of how the derivative of a UnivariateDifferentiableFunction can be defined. I am adding a new answer as I'm unable to comment on the previous one (insufficient reputation).
The used sample specification of the function is f(x) = x^2.
(1) Using a DerivativeStructure:
public DerivativeStructure value(DerivativeStructure t) {
return t.multiply(t);
}
(2) By writing a classical UnivariateFunction:
public UnivariateRealFunction derivative() {
return new UnivariateRealFunction() {
public double value(double x) {
// example derivative
return 2.*x;
}
}
}
If I understand well, the advantage of the first case is that the derivative does not need to be obtained manually, as in the second case. In case the derivative is known, there should thus be no advantage of defining a DerivativeStructure, right? The application I have in mind is that of a Newton-Raphson solver, for which generally the function value and its derivative need to be known.
The full example is provided on the aforementioned web site (authors are Thomas Neidhart and Franz Simons). Any further comments are most welcome!

Echo/delay algorithm just causes noise/static?

There have been other questions and answers on this site suggesting that, to create an echo or delay effect, you need only add one audio sample with a stored audio sample from the past. As such, I have the following Java class:
public class DelayAMod extends AudioMod {
private int delay = 500;
private float decay = 0.1f;
private boolean feedback = false;
private int delaySamples;
private short[] samples;
private int rrPointer;
#Override
public void init() {
this.setDelay(this.delay);
this.samples = new short[44100];
this.rrPointer = 0;
}
public void setDecay(final float decay) {
this.decay = Math.max(0.0f, Math.min(decay, 0.99f));
}
public void setDelay(final int msDelay) {
this.delay = msDelay;
this.delaySamples = 44100 / (1000/this.delay);
System.out.println("Delay samples:"+this.delaySamples);
}
#Override
public short process(short sample) {
System.out.println("Got:"+sample);
if (this.feedback) {
//Delay should feed back into the loop:
sample = (this.samples[this.rrPointer] = this.apply(sample));
} else {
//No feedback - store base data, then add echo:
this.samples[this.rrPointer] = sample;
sample = this.apply(sample);
}
++this.rrPointer;
if (this.rrPointer >= this.samples.length) {
this.rrPointer = 0;
}
System.out.println("Returning:"+sample);
return sample;
}
private short apply(short sample) {
int loc = this.rrPointer - this.delaySamples;
if (loc < 0) {
loc += this.samples.length;
}
System.out.println("Found:"+this.samples[loc]+" at "+loc);
System.out.println("Adding:"+(this.samples[loc] * this.decay));
return (short)Math.max(Short.MIN_VALUE, Math.min(sample + (int)(this.samples[loc] * this.decay), (int)Short.MAX_VALUE));
}
}
It accepts one 16-bit sample at a time from an input stream, finds an earlier sample, and adds them together accordingly. However, the output is just horrible noisy static, especially when the decay is raised to a level that would actually cause any appreciable result. Reducing the decay to 0.01 barely allows the original audio to come through, but there's certainly no echo at that point.
Basic troubleshooting facts:
The audio stream sounds fine if this processing is skipped.
The audio stream sounds fine if decay is 0 (nothing to add).
The stored samples are indeed stored and accessed in the proper order and the proper locations.
The stored samples are being decayed and added to the input samples properly.
All numbers from the call of process() to return sample are precisely what I would expect from this algorithm, and remain so even outside this class.
The problem seems to arise from simply adding signed shorts together, and the resulting waveform is an absolute catastrophe. I've seen this specific method implemented in a variety of places - C#, C++, even on microcontrollers - so why is it failing so hard here?
EDIT: It seems I've been going about this entirely wrong. I don't know if it's FFmpeg/avconv, or some other factor, but I am not working with a normal PCM signal here. Through graphing of the waveform, as well as a failed attempt at a tone generator and the resulting analysis, I have determined that this is some version of differential pulse-code modulation; pitch is determined by change from one sample to the next, and halving the intended "volume" multiplier on a pure sine wave actually lowers the pitch and leaves volume the same. (Messing with the volume multiplier on a non-sine sequence creates the same static as this echo algorithm.) As this and other DSP algorithms are intended to work on linear pulse-code modulation, I'm going to need some way to get the proper audio stream first.
It should definitely work unless you have significant clipping.
For example, this is a text file with two columns. The leftmost column is the 16 bit input. The second column is the sum of the first and a version delayed by 4001 samples. The sample rate is 22KHz.
Each sample in the second column is the result of summing x[k] and x[k-4001] (e.g. y[5000] = x[5000] + x[999] = -13840 + 9181 = -4659) You can clearly hear the echo signal when playing the samples in the second column.
Try this signal with your code and see if you get identical results.

Categories

Resources