I'm coding an interactive applet with Piccolo and I need to include a Gaussian curve (aka Normal distribution chart) inside it.
I imagine any kind of Java implementation would be enough, but I can't find any. Ideally, I'd like to pass a set of values and have the chart drawn in a panel, an image object or anything that can be embedded in an applet.
Before getting my hands dirty coding it myself, does anybody know of a working piece of code to do it?
Implementations in other languages are welcomed, as long as they are easily portable to Java.
Don't know if it works, but Google threw up this code to plot a Gaussian distribution.
The home page for this project is here.
If Piccolo doesn't perform the plotting for you, I would perhaps use JFreeChart for the actual plotting, since it's widely supported and very capable. (I'm not familiar with Piccolo)
Edit: It looks like the Apache Commons Math library has a statistics section. Specifically, a whole package on common Distributions. Hopefully theres some math people out there because I can't remember basic statistics... here's my attempt at using the their library. I just have a sliding window here and calculate the P between those values. Whats the real way to get a PDF from this? They only have a CDF function.
public void testNormalDist() throws MathException {
DistributionFactory f = DistributionFactory.newInstance();
NormalDistribution n = f.createNormalDistribution(0.0d, 1.0d);
double lastx = Double.NEGATIVE_INFINITY;
double nextx = Double.NEGATIVE_INFINITY;
for (int i=-100; i < 100; i++) {
nextx = i / 100d;
System.out.println(n.cumulativeProbability(lastx, nextx));
lastx = nextx;
}
}
I assume you want the probability density function for the graph. The equations are on wikipedia, since I don't know how to include math markup here. Just use p(x) as your Y value and X as your X value and you can get a pretty easy 2-d graph from that.
Have you looked at Mathtools under Java?
Ok, how about this... you give it an array of X points (normalized, of course, you can convert your X pixels to these by dividing each pixel position by the width of your image), and it will return the heights of the distribution curve (again, multiply by your normalization factor). This is for the basic case where mean is 0 and standard deviation is 1.
public double[] normalDistBasic(double[] xarray, double mu) {
double[] yarray = new double[xarray.length];
double rad2pi = 2.50662827d;
for (int off = 0; off < yarray.length; off++) {
double x = xarray[off];
double ss = -1d * x * x / 2d;
yarray[off] = (-1f / rad2pi) * Math.exp(ss);
}
return yarray;
}
It should be pretty easy to implement one that takes arbitrary mean and standard deviation if one can't be found on the net.
Related
I currently have a program which takes a feature vector and classification, and applies it to a known weight vector to generate a loss gradient using Logistic Regression. This is that code:
double[] grad = new double[featureSize];
//dot product w*x
double dot = 0;
for (int j = 0; j < featureSize; j++) {
dot += weights[j] * features[j];
}
//-yi exp(-yi w·xi) / (1+ exp(-yi w·xi))
double gradMultiplier = (-type) * Math.exp((-type) * dot) / (1 + (Math.exp((-type) * dot)));
//-yi xi exp(-yi w·xi) / (1+ exp(-yi w·xi))
for (int j = 0; j < featureSize; j++) {
grad[j] = features[j] * gradMultiplier;
}
return grad;
What I'm trying to do is implement something similar using a Softmax regression, but all of the info of Softmax I find online doesn't exactly follow the same vocabulary as what I know about Logit loss functions, and so I keep getting confused. How would I implement a function similar to the one above but using Softmax?
Based on the wikipedia page for Softmax, I'm under the impression that I might need multiple weight vectors, one for every possible classification. Am I wrong?
The Softmax regression is a generalization of the Logistic regression. In Logistic regression, the labels are binary and in Softmax regression, they can take more than two values. Logistic regression refers to binomial logistic regression and Softmax regression refers to multinomial logistic regression.
There is an excellent page about it here. In you code, you seem to try to implement gradient descent to calculate the weights minimizing the cost function. This topic is covered by the provided link.
Based on the wikipedia page for Softmax, I'm under the impression that
I might need multiple weight vectors, one for every possible
classification. Am I wrong?
You are right. If you have n features and K classes, then your weights are K vectors of n elements as indicated on the link above.
Let me know if it helps.
So I'm programming a recursive program that is supposed to draw Koch's snowflake using OpenGL, and I've got the program basically working except one tiny issue. The deeper the recursion, the weirder 2 particular vertices get. Pictures at the bottom.
EDIT: I don't really care about the OpenGL aspect, I've got that part down. If you don't know OpenGL, all that the glVertex does is draw a line between the two vertices specified in the 2 method calls. Pretend its drawLine(v1,v2). Same difference.
I suspect that my method for finding points is to blame, but I can't find anything that looks incorrect.
I'm following the basically standard drawing method, here are the relevant code snips
(V is for vertex V1 is the bottom left corner, v2 is the bottom right corner, v3 is the top corner):
double dir = Math.PI;
recurse(V2,V1,n);
dir=Math.PI/3;
recurse(V1,V3,n);
dir= (5./3.)* Math.PI ;
recurse(V3,V2,n);
Recursive method:
public void recurse(Point2D v1, Point2D v2, int n){
double newLength = v1.distance(v2)/3.;
if(n == 0){
gl.glVertex2d(v1.getX(),v1.getY());
gl.glVertex2d(v2.getX(),v2.getY());
}else{
Point2D p1 = getPointViaRotation(v1, dir, newLength);
recurse(v1,p1,n-1);
dir+=(Math.PI/3.);
Point2D p2 = getPointViaRotation(p1,dir,newLength);
recurse(p1,p2,n-1);
dir-=(Math.PI*(2./3.));
Point2D p3 = getPointViaRotation(p2, dir, newLength);
recurse(p2,p3,n-1);
dir+=(Math.PI/3.);
recurse(p3,v2,n-1);
}
}
I really suspect my math is the problem, but this looks correct to me:
public static Point2D getPointViaRotation(Point2D p1, double rotation, double length){
double xLength = length * Math.cos(rotation);
double yLength = length * Math.sin(rotation);
return new Point2D.Double(xLength + p1.getX(), yLength + p1.getY());
}
N = 0 (All is well):
N = 1 (Perhaps a little bendy, maybe)
N = 5 (WAT)
I can't see any obvious problem code-wise. I do however have a theory about what happens.
It seems like all points in the graph are based on the locations of the points that came before it. As such, any rounding errors that occurs during this process eventually start accumulating, eventually ending with it going haywire and being way off.
What I would do for starters is calculating the start and end points of each segment before recursing, as to limit the impact of the rounding errors of the inner calls.
One thing about Koch's snowflake is, that the algorithm will lead to a rounding issue one time (it is recursive and all rounding errors add up). The trick is, to keep it going as long as possible. There're three things you can do:
If you want to get more detailed, the only way is to expand the possibilities of Double. You will need to use your own range of coordinates and transform them, every time you actually paint on the screen, to screen coordinates. Your own coordinates should zoom and show the last recursion step (the last triangle) in a coordination system of e.g. 100x100. Then calculate the three new triangles on top of that, transform into screen coordinates and paint.
The line dir=Math.PI/3; divides by 3 instead of (double) 3. Add the . after the 3
Make sure you use Point2D.Double anywhere. Your code should do so, but I would explicitely write it everywhere.
You won the game, when you still have a nice snowflake but get a Stackoverflow.
So, it turns out I am the dumbest man alive.
Thanks everyone for trying, I appreciate the help.
This code is meant to handle an equilateral triangle, its very specific about that (You can tell by the angles).
I put in a triangle with the height equal to the base (not equilateral). When I fixed the input triangle, everything works great.
My question addresses both mathematical and CS issues, but since I need a performant implementation I am posting it here.
Problem:
I have an estimated normal bivariate distribution, defined as a python matrix, but then I will need to transpose the same computation in Java. (dummy values here)
mean = numpy.matrix([[0],[0]])
cov = numpy.matrix([[1,0],[0,1]])
When I receive in inupt a column vector of integers values (x,y) I want to compute the probability of that given tuple.
value = numpy.matrix([[4],[3]])
probability_of_value_given_the_distribution = ???
Now, from a matematical point of view, this would be the integral for 3.5 < x < 4.5 and 2.5 < y < 3.5 over the probability density function of my normal.
What I want to know:
Is there a way to avoid the effective implementation of this, that implies dealing with expressions defined over matrices and with double integrals? Besides that it will take me a while if I had to implement it by myself, this would be computationally expensive. An approximate solution would be perfectly fine for me.
My reasonings:
In an univariate normal, one could simply use the cumulative distribution function (or even store its values for the standard one and then normalize), but unfortunately there appears not to be a closed cdf form for multivariates.
Another approach for univariate is to use the inverse of bivariate approximation (so, approximate a normal as a binomial), but extending this to the multivariate I can't figure out how to keep in count the covariances.
I really hope someone has already implemented this, I need it soon (finishing my thesis) and I couldn't find anything.
OpenTURNS provides an efficient implementation of the CDF of a multinormal distribution (see the code).
import numpy as np
mean = np.array([0.0, 0.0])
cov = np.array([[1.0, 0.0],[0.0, 1.0]])
Let us create the multinormal distribution with these parameters.
import openturns as ot
multinormal = ot.Normal(mean, ot.CovarianceMatrix(cov))
Now let us compute the probability of the square [3.5, 4.5] x |2.5, 3.5]:
prob = multinormal.computeProbability(ot.Interval([3.5,2.5], [4.5,3.5]))
print(prob)
The computed probability is
1.3701244220201715e-06
If you are looking for the probabiliy density function of a bivariate normal distribution, below are a few lines that could do the job:
import numpy as np
def multivariate_pdf(vector, mean, cov):
quadratic_form = np.dot(np.dot(vector-mean,np.linalg.inv(cov)),np.transpose(vector-mean))
return np.exp(-.5 * quadratic_form)/ (2*np.pi * np.linalg.det(cov))
mean = np.array([0,0])
cov = np.array([[1,0],[0,1]])
vector = np.array([4,3])
pdf = multivariate_pdf(vector, mean, cov)
I've started differentiating two images by counting the number of different pixels using a simple algorithm:
private int returnCountOfDifferentPixels(String pic1, String pic2)
{
Bitmap i1 = loadBitmap(pic1);
Bitmap i2 = loadBitmap(pic2);
int count=0;
for (int y = 0; y < i1.getHeight(); ++y)
for (int x = 0; x < i1.getWidth(); ++x)
if (i1.getPixel(x, y) != i2.getPixel(x, y))
{
count++;
}
return count;
}
However this approach seems to be inefficient in its initial form, as there is always a very high number of pixels which differ even in very similar photos.
I was thinking of a way of to determine if two pixels are really THAT different.
the bitmap.getpixel(x,y) from android returns a Color object.
How can I implement a proper differentiation between two Color objects, to help with my motion detection?
You are right, because of noise and other factors there is usually a lot of raw pixel change in a video stream. Here are some options you might want to consider:
Blurring the image first, ideally with a Gaussian filter or with a simple box filter. This just means that you take the (weighted) average over the neighboring pixel and the pixel itself. This should reduce the sensor noise quite a bit already.
Only adding the difference to count if it's larger than some threshold. This has the effect of only considering pixels that have really changed a lot. This is very easy to implement and might already solve your problem alone.
Thinking about it, try these two options first. If they don't work out, I can give you some more options.
EDIT: I just saw that you're not actually summing up differences but just counting different pixels. This is fine if you combine it with Option 2. Option 1 still works, but it might be an overkill.
Also, to find out the difference between two colors, use the methods of the Color class:
int p1 = i1.getPixel(x, y);
int p2 = i2.getPixel(x, y);
int totalDiff = Color.red(p1) - Color.red(p2) + Color.green(p1) - Color.green(p2) + Color.blue(p1) - Color.blue(p2);
Now you can come up with a threshold the totalDiff must exceed to contribute to count.
Of course, you can play around with these numbers in various ways. The above code for example only computes changes in pixel intensity (brightness). If you also wanted to take into account changes in hue and saturation, you would have to compute totalDifflike this:
int totalDiff = Math.abs(Color.red(p1) - Color.red(p2)) + Math.abs(Color.green(p1) - Color.green(p2)) + Math.abs(Color.blue(p1) - Color.blue(p2));
Also, have a look at the other methods of Color, for example RGBToHSV(...).
I know that this is essentially very similar another answer here but I think be restating it in a different form it might prove useful to those seeking the solution. This involves have more than two images over time. If you only literally then this will not work but an equivilent method will.
Do the history for all pixels on each frame. For example, for each pixel:
history[x, y] = (history[x, y] * (w - 1) + get_pixel(x, y)) / w
Where w might be w = 20. The higher w the larger the spike for motion but the longer motion has to be missing for it to reset.
Then to determine if something has changed you can do this for each pixel:
changed_delta = abs(history[x, y] - get_pixel(x, y))
total_delta += changed_delta
You will find that it stabilizes most of the noise and when motion happens you will get a large difference. You are essentially taking many frames and detecting motion from the many against the newest frame.
Also, for detecting positions of motion consider breaking the image into smaller pieces and doing them individually. Then you can find objects and track them across the screen by treating a single image as a grid of separate images.
Overview
So I'm trying to get a grasp on the mechanics of neural networks. I still don't totally grasp the math behind it, but I think I understand how to implement it. I currently have a neural net that can learn AND, OR, and NOR training patterns. However, I can't seem to get it to implement the XOR pattern. My feed forward neural network consists of 2 inputs, 3 hidden, and 1 output. The weights and biases are randomly set between -0.5 and 0.5, and outputs are generated with the sigmoidal activation function
Algorithm
So far, I'm guessing I made a mistake in my training algorithm which is described below:
For each neuron in the output layer, provide an error value that is the desiredOutput - actualOutput --go to step 3
For each neuron in a hidden or input layer (working backwards) provide an error value that is the sum of all forward connection weights * the errorGradient of the neuron at the other end of the connection --go to step 3
For each neuron, using the error value provided, generate an error gradient that equals output * (1-output) * error. --go to step 4
For each neuron, adjust the bias to equal current bias + LEARNING_RATE * errorGradient. Then adjust each backward connection's weight to equal current weight + LEARNING_RATE * output of neuron at other end of connection * this neuron's errorGradient
I'm training my neural net online, so this runs after each training sample.
Code
This is the main code that runs the neural network:
private void simulate(double maximumError) {
int errorRepeatCount = 0;
double prevError = 0;
double error; // summed squares of errors
int trialCount = 0;
do {
error = 0;
// loop through each training set
for(int index = 0; index < Parameters.INPUT_TRAINING_SET.length; index++) {
double[] currentInput = Parameters.INPUT_TRAINING_SET[index];
double[] expectedOutput = Parameters.OUTPUT_TRAINING_SET[index];
double[] output = getOutput(currentInput);
train(expectedOutput);
// Subtracts the expected and actual outputs, gets the average of those outputs, and then squares it.
error += Math.pow(getAverage(subtractArray(output, expectedOutput)), 2);
}
} while(error > maximumError);
Now the train() function:
public void train(double[] expected) {
layers.outputLayer().calculateErrors(expected);
for(int i = Parameters.NUM_HIDDEN_LAYERS; i >= 0; i--) {
layers.allLayers[i].calculateErrors();
}
}
Output layer calculateErrors() function:
public void calculateErrors(double[] expectedOutput) {
for(int i = 0; i < numNeurons; i++) {
Neuron neuron = neurons[i];
double error = expectedOutput[i] - neuron.getOutput();
neuron.train(error);
}
}
Normal (Hidden & Input) layer calculateErrors() function:
public void calculateErrors() {
for(int i = 0; i < neurons.length; i++) {
Neuron neuron = neurons[i];
double error = 0;
for(Connection connection : neuron.forwardConnections) {
error += connection.output.errorGradient * connection.weight;
}
neuron.train(error);
}
}
Full Neuron class:
package neuralNet.layers.neurons;
import java.util.ArrayList;
import java.util.List;
import java.util.Random;
import neuralNet.Parameters;
import neuralNet.layers.NeuronLayer;
public class Neuron {
private double output, bias;
public List<Connection> forwardConnections = new ArrayList<Connection>(); // Forward = layer closer to input -> layer closer to output
public List<Connection> backwardConnections = new ArrayList<Connection>(); // Backward = layer closer to output -> layer closer to input
public double errorGradient;
public Neuron() {
Random random = new Random();
bias = random.nextDouble() - 0.5;
}
public void addConnections(NeuronLayer prevLayer) {
// This is true for input layers. They create their connections differently. (See InputLayer class)
if(prevLayer == null) return;
for(Neuron neuron : prevLayer.neurons) {
Connection.createConnection(neuron, this);
}
}
public void calcOutput() {
output = bias;
for(Connection connection : backwardConnections) {
connection.input.calcOutput();
output += connection.input.getOutput() * connection.weight;
}
output = sigmoid(output);
}
private double sigmoid(double output) {
return 1 / (1 + Math.exp(-1*output));
}
public double getOutput() {
return output;
}
public void train(double error) {
this.errorGradient = output * (1-output) * error;
bias += Parameters.LEARNING_RATE * errorGradient;
for(Connection connection : backwardConnections) {
// for clarification: connection.input refers to a neuron that outputs to this neuron
connection.weight += Parameters.LEARNING_RATE * connection.input.getOutput() * errorGradient;
}
}
}
Results
When I'm training for AND, OR, or NOR the network can usually converge within about 1000 epochs, however when I train with XOR, the outputs become fixed and it never converges. So, what am I doing wrong? Any ideas?
Edit
Following the advice of others, I started over and implemented my neural network without classes...and it works. I'm still not sure where my problem lies in the above code, but it's in there somewhere.
This is surprising because you are using a big enough network (barely) to learn XOR. Your algorithm looks right, so I dont really know what is going on. It might help to know how you generate your training data: are you just reating the samples (1,0,1),(1,1,0),(0,1,1),(0,0,0) or something like that over and over? Perhaps the problem is that stochastic gradient descent is causing you to jump around the stabilizing minima. You could try some things to fix this: perhaps randomly sample from your training examples instead of repeating them (if that is what you are doing). Or, alternatively, you could modify your learning algorithm:
currently you have something equivalent to:
weight(epoch) = weight(epoch - 1) + deltaWeight(epoch)
deltaWeight(epoch) = mu * errorGradient(epoch)
where mu is the learning rate
One option is to very slowly decrease the value of mu.
An alternative would be to change your definition of deltaWeight to include a "momentum"
deltaWeight(epoch) = mu * errorGradient(epoch) + alpha * deltaWeight(epoch -1)
where alpha is the momentum parameter (between 0 and 1).
Visually, you can think of gradient descent as trying to find the minimum point of a curved surface by placing an object on that surface, and then step by step moving that object small amounts in which ever directing is sloping down based on where it is currently located. The problem is that you dont really do gradient descent: instead you do stochastic gradient descent where you move in direction by sampling from a set of training vectors and moving in what ever direction the sample makes look like is down. On average over the entire training data, stochastic gradient descent should work, but it is isn't guaranteed to because you can get into a situation where you jump back and forth never making progress. Slowly decreasing the learning rate means you take smaller and smaller steps each time so can not get stuck in an infinite cycle.
On the other hand, momentum makes the algorithm into something akin to rolling a rubber ball. As the ball roles it tends to go in the downwards direction, but it also tends to keep going in the direction it was going before, and if it is ever on a stretch where the down slope is in the same direction for a while it will speed up. The ball will therefore jump over some local minima, and it will be more resilient against stepping back and forth over the target because doing so means working against the force of momentum.
Having some code and thinking about this some more, it is pretty clear that your problem is in training the early layers. The functions you have successfully learned are all linearly separable, so it would make sense that only a single layer is being properly learned. I agree with LiKao about implementation strategies in general, although your approach should work. My suggestion for how to debug this is figure out what the progression of the weights on the connections between the input layer and the output layer looks like.
You should post the rest implementation of Neuron.
I faced the same problem short time ago. Finally I found the solution, how to write a code solving XOR wit the MLP algorithm.
The XOR problem seems to be an easy to learn problem but it isn't for the MLP because it's not linearly separable. So even if your MLP is OK (I mean there is no bug in your code) you have to find the good parameters to be able to learn the XOR problem.
Two hidden and one output neuron is fine. The 2 main thing you have to set:
although you have only 4 training samples you have to run the training for a couple of thousands epoch.
if you use sigmoid hidden layers but linear output the network will converge faster
Here is the detailed description and sample code: http://freeconnection.blogspot.hu/2012/09/solving-xor-with-mlp.html
Small hint - if the output of your NN seem to drift toward 0.5 then everything's OK!
The algorithm using just the learning rate and bias is just too simple to quickly learn XOR. You can either increase the number of epochs or change the algorithm.
My recommendation is to use momentum:
1000 epochs
learningRate = 0.3
momentum = 0.8
weights drawn from [0,1]
bias drawn form [-0.5, 0.5]
And some crucial pseudo code (assuming back and forward propagation works) :
for every edge:
previous_edge_weight_change = -1 * learningRate * edge_source_neuron_value * edge_target_neuron_delta + previous_edge_weight * momentum
edge_weight += previous_edge_weight_change
for every neuron:
previous_neuron_bias_change = -1 * learningRate * neuron_delta + previous_neuron_bias_change * momentum
bias += previous_neuron_bias_change
I would suggest you to generate a grid (say from [-5,-5] to [5,5] with a step like 0.5), learn your MLP on the XOR and apply it to the grid. Plotted in color you could see some kind of a frontier.
If you do that at each iteration, you'll see the evolution of the frontier and can control the learning.
It's been a while since I last implemented an Neural Network myself, but I think your mistake is in the lines:
bias += Parameters.LEARNING_RATE * errorGradient;
and
connection.weight += Parameters.LEARNING_RATE * connection.input.getOutput() * errorGradient;
The first of these lines should not be there at all. Bias is best modeled as the input of a neuron which is fixed at 1. This will serve to make your code a lot simpler and cleaner, because you will not have to treat the bias in any special way.
The other point is, that I think the sign is wrong in both of these expressions. Think about it like this:
Your gradient points into the direction of steepest ascend, so if you go into that direction, your error will get larger.
What you are doing here is adding something to the weights, in case the error is already positive, i.e. you are making it more positive. If it is negative you are substracting someting, i.e. you make it more negative.
Unless I am missing something about your definition of error or the gradient calculation you should change these lines to:
bias -= Parameters.LEARNING_RATE * errorGradient;
and
connection.weight -= Parameters.LEARNING_RATE * connection.input.getOutput() * errorGradient;
I had a similar mistake in one of my early implementations and it lead to exactly the same behaviour, i.e. it resulted in a network that learned in simple cases, but not anymore once the training data became more complex.
LiKao's comment to simplify my implementation and get rid of the object-oriented aspects solved my problem. The flaw in the algorithm as it is described above is unknown, however I now have a working neural network that is a lot smaller.
Feel free to continue to provide insight on the problem with my previous implementation, as others may have the same problem in the future.
I'm a bit rusty on neural networks, but I think there was a problem to implement the XOR with one perceptron: basically a neuron is able to separate two groups of solutions through a straight line, but one straight line is not sufficient for the XOR problem...
Here should be the answer!
I couldn't see anything wrong with the code, but I was having a similar problem with my network not converging for XOR, so figured I'd post my working configuration.
3 input neurons (one of them being a fixed bias of 1.0)
3 hidden neurons
1 output neuron
Weights randomly chosen between -0.5 and 0.5.
Sigmoid activation function.
Learning rate = 0.2
Momentum = 0.4
Epochs = 50,000
Converged 10/10 times.
One of the mistakes I was making was not connecting the bias input to the output neuron, and this would mean for the same configuration it only converged 2 out of 10 times with the other eight times failing because 1 and 1 would output 0.5.
Another mistake was not doing enough epochs. If I only did 1000 then the outputs tend to be around 0.5 for every test case. With epochs >= 8000 so 2000 times for each test case, it started to look like it might be working (but only if using momentum).
When doing 50000 epochs it did not matter whether momentum was used or not.
Another thing I tried was to not apply the sigmoid function to the output neurons output (which I think was what an earlier post had suggested), but this wrecked the network because the output*(1-output) part of the error equation could now be negative meaning weights were updated in a way that caused the error to increase.