Gradient Descent Linear Regression in Java

Gradient Descent Linear Regression in Java - java

This a bit of a long shot, but I wonder if someone could look at this. Am I doing Batch Gradient descent for linear regression correctly here?
It gives the expected answers for a single independent and intercept, but not for multiple independent variables.
/**
* (using Colt Matrix library)
* #param alpha Learning Rate
* #param thetas Current Thetas
* #param independent
* #param dependent
* #return new Thetas
*/
public DoubleMatrix1D descent(double alpha,
DoubleMatrix1D thetas,
DoubleMatrix2D independent,
DoubleMatrix1D dependent ) {
Algebra algebra = new Algebra();
// ALPHA*(1/M) in one.
double modifier = alpha / (double)independent.rows();
//I think this can just skip the transpose of theta.
//This is the result of every Xi run through the theta (hypothesis fn)
//So each Xj feature is multiplied by its Theata, to get the results of the hypothesis
DoubleMatrix1D hypothesies = algebra.mult( independent, thetas );
//hypothesis - Y
//Now we have for each Xi, the difference between predictect by the hypothesis and the actual Yi
hypothesies.assign(dependent, Functions.minus);
//Transpose Examples(MxN) to NxM so we can matrix multiply by hypothesis Nx1
DoubleMatrix2D transposed = algebra.transpose(independent);
DoubleMatrix1D deltas = algebra.mult(transposed, hypothesies );
// Scale the deltas by 1/m and learning rate alhpa. (alpha/m)
deltas.assign(Functions.mult(modifier));
//Theta = Theta - Deltas
thetas.assign( deltas, Functions.minus );
return( thetas );
}

There is nothing wrong in your implementation and based on your comment the problem in collinearity which you induce when generating x2. This is problematic in regression estimation.
To test your algorithm, you can generate two independent columns of random numbers. Pick a value of w0, w1 and w2 i.e. coefficients for intercept, x1 and x2 respectively. Calculate the dependent value y.
Then see if your stochastic/batch gradient decent algorithm can recover w0, w1 and w2 values

I think adding
// ALPHA*(1/M) in one.
double modifier = alpha / (double)independent.rows();
Is a bad Idea, since you're mixing gradient function with the gradient descent algorithm, it's much better to have a gradientDescent algorithm inside a public method like following in Java:
import org.la4j.Matrix;
import org.la4j.Vector;
public Vector gradientDescent(Matrix x, Matrix y, int kmax, double alpha)
{
int k=1;
Vector thetas = Vector.fromArray(new double[] { 0.0, 0.0});
while (k<kmax)
{
thetas = thetas.subtract(gradient(x, y, thetas).multiply(alpha));
k++;
}
return thetas;
}

Related

Why does my 1D gravity simulation not act like a pendulum?

My gravity simulation acts more like a gravity slingshot. Once the two bodies pass over each other, they accelerate far more than they decelerate on the other side. It's not balanced. It won't oscillate around an attractor.
How do other gravity simulators get around it? example: http://www.testtubegames.com/gravity.html, if you create 2 bodies they will just oscillate back and forth, not drifting any further apart than their original distance even though they move through each other as in my example.
That's how it should be. But in my case, as soon as they get close they just shoot away from each other to the edges of the imaginary galaxy never to come back for a gazillion years.
edit: Here is a video of the bug https://imgur.com/PhhRhP7
Here is a minimal test case to run in processing.
//Constants:
float v;
int unit = 1; //1 pixel = 1 meter
float x;
float y;
float alx;
float aly;
float g = 6.67408 * pow(10, -11) * sq(unit); //g constant
float m1 = (1 * pow(10, 15)); // attractor mass
float m2 = 1; //object mass
void setup() {
size (200,200);
a = 0;
v = 0;
x = width/2; // object x
y = 0; // object y
alx = width/2; //attractor x
aly = height/2; //attractor y
}
void draw() {
background(0);
getAcc();
applyAcc();
fill(0,255,0);
ellipse(x, y, 10, 10); //object
fill(255,0,0);
ellipse(alx, aly, 10, 10); //attractor
}
void applyAcc() {
a = getAcc();
v += a * (1/frameRate); //add acceleration to velocity
y += v * (1/frameRate); //add velocity to Y
a = 0;
}
float getAcc() {
float a = 0;
float d = dist(x, y, alx, aly); //distance to attractor
float gravity = (g * m1 * m2)/sq(d); //gforce
a += gravity/m2;
if (y > aly){
a *= -1;}
return a;
}

Your distance doesn't include width of the object, so the objects effectively occupy the same space at the same time.
The way to "cap gravity" as suggested above is add a normal force when the outer edges touch, if it's a physical simulation.

You should get into the habit of debugging your code. Which line of code is behaving differently from what you expected?
For example, if I were you I would start by printing out the value of gravity every time you calculate it:
float gravity = (g * m1 * m2)/sq(d); //gforce
println(gravity);
You'll notice that your gravity value skyrockets as your circles get closer to each other. And this makes sense, because you're dividing by sq(d). Ad d gets smaller, your gravity increases.
You could simply cap your gravity value so it doesn't go off the charts anymore:
float gravity = (g * m1 * m2)/sq(d);
if(gravity > 100){
gravity = 100;
}
Alternatively you could cap d so it never goes below a certain value, but the result is the same.
In the end you'll find that this is not going to be as easy as you expected. You're going to have to tune the parameters quite a bit so your simulation works how you want.

Working demo here: https://beta.observablehq.com/#shaunlebron/1d-gravity
I followed the solution posted by the author of the sim that inspired this question here:
-First off, shrinking the timestep is always helpful. My simulation runs, as a baseline, about 40 ‘steps’ per frame, and 30 frames per second.
-To deal with the exact issue you talk about, I think modeling the bodies not as pure point masses - but rather spherical masses with a certain radius will be vital. That prevents the force of gravity from diverging to infinity. So, for instance, if you drop an asteroid into a star in my simulation (with collisions turned off), the force of gravity will increase as the asteroid gets closer, up until it reaches the surface of the star, at which point the force will begin to decrease. And the moment it’s at the center of the star (or nearby), the force will be zero (or nearly zero) - instead of near-infinite.
In my demo, I just completed turned off gravity when two objects are close enough together. Seems to work well enough.

3D Vector linear interpolation

How can I lerp between two 3d vectors?
I use this method for 2d vectors:
public Vector2d lerp(Vector2d other, double speed, double error) {
if (equals(other) || getDistanceSquared(other) <= error * error)
return other;
double dx = other.getX() - this.x, dy = other.getY() - this.y;
double direction = Math.atan2(dy, dx);
double x = this.x + (speed * Math.cos(direction));
double y = this.y + (speed * Math.sin(direction));
return new Vector2d(x, y);
}
Note: this is not exactly "linear interpolation"; this method will interpolate at a constant rate, which is what I want.
I want to do exactly what this does but with an added z component for the third dimension. How can I do this?

The easiest way would be to transform your two vectors such that they lie in the (u, v) plane; then apply your method above; then transform back to the original coordinate space.
This requires you to construct a rotation matrix:
Take the cross product of your two vectors to get the mutual normal vector; call this cross_1;
Define that this points along the u axis;
Take the cross product of this and cross_1 to get a vector cross_2, which is the direction of your v axis.
Normalize each of these three vectors; call them this_norm, cross_2_norm and cross_1_norm.
These three vectors can be written as a 3x3 orthonormal matrix (each of the vectors is a 3-element column vector):
R = [ this_norm cross_2_norm cross_1_norm ]
Now: you can multiply your 3d vectors this and other by this matrix, and you will get vectors which have the form
[ u ]
[ v ]
[ 0 ]
i.e. a 3-dimensional column vector with zero as the third element (or, at least, you should. I may have forgotten to transpose the 3x3 matrix above).
So, you can obviously discard the third element, and have 2-element column vectors: you can store these in Vector2d. And so you can apply your method above to do the interpolation.
That gives you a Vector2d which interpolates in the (u, v) plane. You can transform that back to the (x, y, z) space by attaching a zero third element to it, and pre-multiplying by R' (which is the inverse of R, since it is orthonormal).
Of course, you need to handle degenerate cases, like zero and (anti-)parallel vectors. In these cases, one or both of the cross products are zero, meaning you can't normalize them; simply pick arbitrary directions instead.

If I understand your code correctly, when you compute dx and dy offsets, then compute angle from it, and finally sin/cos pair - you're basically normalizing the dx,dy vector, so you could write it like that:
Vector2d delta = other - this; // I'm not sure about your API here,
delta.normalize(); // you may need to fix those lines
double x = this.x + (speed * delta.x);
double y = this.y + (speed * delta.y);
Now it should be straightforward to add a Z component.

Car Steering - OpenGL

I'm creating a racing car game and am having trouble figuring out how to get steering to work. I have a basic race 2D race course that is built in a 3D environment. The program only uses x and y, with z being 0.
My race course consists of a road that is 29 units wide in the x-axis and two long tracks that are 120 units long in the y direction. At 120 units in the y-axis there is a 180 degree turn. You can think of the course as looking similar to a nascar styled race course.
I'm trying to set my car's steering so that it can turn realistically when I reach the 180 degree turns. I'm using two variables that separately control the x / y positions, as well as two variables for the x / y velocities. My code at the moment is as follows:
public void steering(){
double degreesPerFrame = 180 / (2*30); //180 degrees over 30 fps in 2s
velocityX = velocityX * -1 * Math.cos(degreesPerFrame);
velocityY = velocityY * -1 * Math.sin(degreesPerFrame);
double yChange = Math.sin(degreesPerFrame) * velocityY;
double xChange = Math.cos(degreesPerFrame) * velocityX;
x += xChange; //change x position
y += yChange; //change y position
}
I'm not completely sure how I can get my steering to turn properly. I'm stuck at the moment and not sure what I would need to change in my function to get steering working properly.

I think this would be easier if you don't use angles at all for you calculations. Simply operate with position, velocity, and acceleration vectors.
First, a quick reminder from physics 101. With a small time step dt, current position p, and current velocity vector v, and an acceleration vector a, the new position p' and velocity vector v' are:
v' = v + a * dt
p' = p + v' * dt
The acceleration a depends on the driver input. For example:
When moving ahead at a constant speed, it is 0.
When accelerating, it is limited by engine power and tire grip in longitudinal direction.
When turning, it is limited by tire grip in lateral direction.
When braking, it is limited by tire grip (mostly, the brakes are typically strong enough to not be a limit).
For a relatively simple model, you can assume that grip in longitudinal and lateral direction are the same (which is not entirely true in reality), which limits the total acceleration to a vector within a circle, which is commonly called the friction circle.
That was mainly just background. Back to the more specific case here, which is turning at a steady speed. The result of steering the car can be modeled by a lateral force, and corresponding acceleration. So in this case, we can apply an acceleration vector that is orthogonal to the current velocity vector.
With the components of the velocity vector:
v = (vx, vy)
a vector that points orthogonally to the left (you mentioned NASCAR, so there's only left turns...) of this is:
(-vy, vx)
Since we want to control the amount of lateral acceleration, which is the length of the acceleration vector, we normalize this vector, and multiply it by the maximum acceleration:
a = aMax * normalize(-vy, vx)
If you use real units for your calculations, you can apply a realistic number for the maximum lateral acceleration aMax. Otherwise, just tweak it to give you the desired maneuverability for the car in your artificial units.
That's really all you need. Recapping the steps in code form:
// Realistic value for sports car on street tires when using
// metric units. 10 m/s^2 is slightly over 1 g.
static const float MAX_ACCELERATION = 10.0f;
float deltaTime = ...; // Time since last update.
float accelerationX = -veclocityY;
float accelerationY = velocityX;
float accelerationScale = MAX_ACCELERATION /
sqrt(accelerationX * accelerationX + accelerationY * accelerationY);
accelerationX *= accelerationScale;
acceleration *= accelerationScale;
velocityX += accelerationX * deltaTime;
velocityY += accelerationY * deltaTime;
x += velocityX * deltaTime;
y += velocityY * deltaTime;

Vectorizing a gradient descent algorithm

I am coding gradient descent in matlab.
For two features, I get for the update step:
temp0 = theta(1,1) - (alpha/m)*sum((X*theta-y).*X(:,1));
temp1 = theta(2,1) - (alpha/m)*sum((X*theta-y).*X(:,2));
theta(1,1) = temp0;
theta(2,1) = temp1;
However, I want to vectorize this code and to be able to apply it to any number of features.
For the vectorization part, it shows that what I am trying to do is a matrix multiplication
theta = theta - (alpha/m) * (X' * (X*theta-y));
This is well seen, but when I tried, I realized that it doesn't work for gradient descent because the parameters are not updated simultaneously.
Then, how can I vectorize this code and make sure the parameters and updated at the same time?

For the vectorized version try the following(two steps to make simultaneous update explicitly) :
gradient = (alpha/m) * X' * (X*theta -y)
theta = theta - gradient

Your vectorization is correct. I also tried both of your code, and it got me the same theta. Just remember don't use your updated theta in your second implementation.
This also works but less simplified than your 2nd implementation:
Error = X * theta - y;
for i = 1:2
S(i) = sum(Error.*X(:,i));
end
theta = theta - alpha * (1/m) * S'

In order to update them simultaneously you need to keep the value of theta(1..n) in temporary vector and after the operation just update values in original theta vector.
This is the code, that I use for this purpose:
Temp update
tempChange = zeros(length(theta), 1);
tempChage = theta - (alpha/m) * (X' * (X*theta-y));
Actual update
theta = tempChage;

theta = theta - (alpha/m) * (X') * ((X*theta)-y)

I am very new to this topic, still my opinion is:
if you compute X*theta before hand then while doing vectorized operation to adjust theta, need not to be in temp.
in other words:
if you compute X*theta while updating theta vector, theta(1) updates before theta(2) and hence changes the X*theta.
but if we compute X*theta as y_pred and then do vectorize op on theta, it will be ok.
so my suggestion is(without using temp):
y_pred = X*theta %theta is [1;1] and X is mX2 matrix
theta = theta - (alpha/m) * (X' * (y_pred-y));
Please correct me if I am wrong.

Here is the vectorized form of gradient descent it works for me in octave.
remember that X is a matrix with ones in the first column (since theta_0 *1 is thetha_0). For each column in X you have a feature(n) in X. Each row is a training set(m). so X a m X (n+1 ) matrix.
The y column vector could be the house prices.
Its good to have a cost function to check if you find a minimum.
choose a value for alpha maybe a = 0.001 and try changing it for each time you run the code. The num_iters is the times you want it to run.
function theta = gradientDescent(X, y, theta, alpha, num_iters)
m = length(y); % number of training examples
for iter = 1:num_iters
theta = theta - (alpha/m) * (X') * ((X*theta)-y)
end
end
see the full explanation here: https://www.coursera.org/learn/machine-learning/resources/QQx8l

In python vectorized gradient descent implementation for linear regression with MSE loss may look like the following:
import numpy as np
def grad_step(X, y, θ, α):
m = X.shape[0]
return θ - (α / m) * 2 * X.T # (X # θ - y)
def compute_loss(X, y, θ):
return np.mean((X # θ - y).T # (X # θ - y))
# run gradient descent
θ = np.zeros(X.shape[1]).reshape(-1,1)
for _ in range(100):
θ = grad_step(X, y, θ, α = 0.01)
Since with matrix differentiation rules, we can compute the gradient of the cost function as follows:

Minimising cumulative floating point arithmetic error

I have a 2D convex polygon in 3D space and a function to measure the area of the polygon.
public double area() {
if (vertices.size() >= 3) {
double area = 0;
Vector3 origin = vertices.get(0);
Vector3 prev = vertices.get(1).clone();
prev.sub(origin);
for (int i = 2; i < vertices.size(); i++) {
Vector3 current = vertices.get(i).clone();
current.sub(origin);
Vector3 cross = prev.cross(current);
area += cross.magnitude();
prev = current;
}
area /= 2;
return area;
} else {
return 0;
}
}
To test that this method works at all orientations of the polygon I had my program rotate it a little bit each iteration and calculate the area. Like so...
Face f = poly.getFaces().get(0);
for (int i = 0; i < f.size(); i++) {
Vector3 v = f.getVertex(i);
v.rotate(0.1f, 0.2f, 0.3f);
}
if (blah % 1000 == 0)
System.out.println(blah + ":\t" + f.area());
My method seems correct when testing with a 20x20 square. However the rotate method (a method in the Vector3 class) seems to introduce some error into the position of each vertex in the polygon, which affects the area calculation. Here is the Vector3.rotate() method
public void rotate(double xAngle, double yAngle, double zAngle) {
double oldY = y;
double oldZ = z;
y = oldY * Math.cos(xAngle) - oldZ * Math.sin(xAngle);
z = oldY * Math.sin(xAngle) + oldZ * Math.cos(xAngle);
oldZ = z;
double oldX = x;
z = oldZ * Math.cos(yAngle) - oldX * Math.sin(yAngle);
x = oldZ * Math.sin(yAngle) + oldX * Math.cos(yAngle);
oldX = x;
oldY = y;
x = oldX * Math.cos(zAngle) - oldY * Math.sin(zAngle);
y = oldX * Math.sin(zAngle) + oldY * Math.cos(zAngle);
}
Here is the output for my program in the format "iteration: area":
0: 400.0
1000: 399.9999999999981
2000: 399.99999999999744
3000: 399.9999999999959
4000: 399.9999999999924
5000: 399.9999999999912
6000: 399.99999999999187
7000: 399.9999999999892
8000: 399.9999999999868
9000: 399.99999999998664
10000: 399.99999999998386
11000: 399.99999999998283
12000: 399.99999999998215
13000: 399.9999999999805
14000: 399.99999999998016
15000: 399.99999999997897
16000: 399.9999999999782
17000: 399.99999999997715
18000: 399.99999999997726
19000: 399.9999999999769
20000: 399.99999999997584
Since this is intended to eventually be for a physics engine I would like to know how I can minimise the cumulative error since the Vector3.rotate() method will be used on a very regular basis.
Thanks!
A couple of odd notes:
The error is proportional to the amount rotated. ie. bigger rotation per iteration -> bigger error per iteration.
There is more error when passing doubles to the rotate function than when passing it floats.

You'll always have some cumulative error with repeated floating point trig operations — that's just how they work. To deal with it, you basically have two options:
Just ignore it. Note that, in your example, after 20,000 iterations(!) the area is still accurate down to 13 decimal places. That's not bad, considering that doubles can only store about 16 decimal places to begin with.
Indeed, plotting your graph, the area of your square seems to be going down more or less linearly:
This makes sense, assuming that the effective determinant of your approximate rotation matrix is about 1 − 3.417825 × 10-18, which is well within normal double precision floating point error range of one. If that's the case, the area of your square would continue a very slow exponential decay towards zero, such that you'd need about two billion (2 × 109) 7.3 × 1014 iterations to get the area down to 399. Assuming 100 iterations per second, that's about seven and a half months 230 thousand years.
Edit: When I first calculated how long it would take for the area to reach 399, it seems I made a mistake and somehow managed to overestimate the decay rate by a factor of about 400,000(!). I've corrected the mistake above.
If you still feel you don't want any cumulative error, the answer is simple: don't iterate floating point rotations. Instead, have your object store its current orientation in a member variable, and use that information to always rotate the object from its original orientation to its current one.
This is simple in 2D, since you just have to store an angle. In 3D, I'd suggest storing either a quaternion or a matrix, and occasionally rescaling it so that its norm / determinant stays approximately one (and, if you're using a matrix to represent the orientation of a rigid body, that it remains approximately orthogonal).
Of course, this approach won't eliminate cumulative error in the orientation of the object, but the rescaling does ensure that the volume, area and/or shape of the object won't be affected.

You say there is cumulative error but I don't believe there is (note how your output desn't always go down) and the rest of the error is just due to rounding and loss of precision in a float.
I did work on a 2d physics engine in university (in java) and found double to be more precise (of course it is see oracles datatype sizes
In short you will never get rid of this behaviour you just have to accept the limitations of precision
EDIT:
Now I look at your .area function there is possibly some cumulative due to
+= cross.magnitude
but I have to say that whole function looks a bit odd. Why does it need to know the previous vertices to calculate the current area?

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.