Vectorizing a gradient descent algorithm

Vectorizing a gradient descent algorithm - java

I am coding gradient descent in matlab.
For two features, I get for the update step:
temp0 = theta(1,1) - (alpha/m)*sum((X*theta-y).*X(:,1));
temp1 = theta(2,1) - (alpha/m)*sum((X*theta-y).*X(:,2));
theta(1,1) = temp0;
theta(2,1) = temp1;
However, I want to vectorize this code and to be able to apply it to any number of features.
For the vectorization part, it shows that what I am trying to do is a matrix multiplication
theta = theta - (alpha/m) * (X' * (X*theta-y));
This is well seen, but when I tried, I realized that it doesn't work for gradient descent because the parameters are not updated simultaneously.
Then, how can I vectorize this code and make sure the parameters and updated at the same time?

For the vectorized version try the following(two steps to make simultaneous update explicitly) :
gradient = (alpha/m) * X' * (X*theta -y)
theta = theta - gradient

Your vectorization is correct. I also tried both of your code, and it got me the same theta. Just remember don't use your updated theta in your second implementation.
This also works but less simplified than your 2nd implementation:
Error = X * theta - y;
for i = 1:2
S(i) = sum(Error.*X(:,i));
end
theta = theta - alpha * (1/m) * S'

In order to update them simultaneously you need to keep the value of theta(1..n) in temporary vector and after the operation just update values in original theta vector.
This is the code, that I use for this purpose:
Temp update
tempChange = zeros(length(theta), 1);
tempChage = theta - (alpha/m) * (X' * (X*theta-y));
Actual update
theta = tempChage;

theta = theta - (alpha/m) * (X') * ((X*theta)-y)

I am very new to this topic, still my opinion is:
if you compute X*theta before hand then while doing vectorized operation to adjust theta, need not to be in temp.
in other words:
if you compute X*theta while updating theta vector, theta(1) updates before theta(2) and hence changes the X*theta.
but if we compute X*theta as y_pred and then do vectorize op on theta, it will be ok.
so my suggestion is(without using temp):
y_pred = X*theta %theta is [1;1] and X is mX2 matrix
theta = theta - (alpha/m) * (X' * (y_pred-y));
Please correct me if I am wrong.

Here is the vectorized form of gradient descent it works for me in octave.
remember that X is a matrix with ones in the first column (since theta_0 *1 is thetha_0). For each column in X you have a feature(n) in X. Each row is a training set(m). so X a m X (n+1 ) matrix.
The y column vector could be the house prices.
Its good to have a cost function to check if you find a minimum.
choose a value for alpha maybe a = 0.001 and try changing it for each time you run the code. The num_iters is the times you want it to run.
function theta = gradientDescent(X, y, theta, alpha, num_iters)
m = length(y); % number of training examples
for iter = 1:num_iters
theta = theta - (alpha/m) * (X') * ((X*theta)-y)
end
end
see the full explanation here: https://www.coursera.org/learn/machine-learning/resources/QQx8l

In python vectorized gradient descent implementation for linear regression with MSE loss may look like the following:
import numpy as np
def grad_step(X, y, θ, α):
m = X.shape[0]
return θ - (α / m) * 2 * X.T # (X # θ - y)
def compute_loss(X, y, θ):
return np.mean((X # θ - y).T # (X # θ - y))
# run gradient descent
θ = np.zeros(X.shape[1]).reshape(-1,1)
for _ in range(100):
θ = grad_step(X, y, θ, α = 0.01)
Since with matrix differentiation rules, we can compute the gradient of the cost function as follows:

Related

How to measure distance using ARCore?

Is it possible to calculate distance between two HitResult `s ?
Or how we can calculate real distance (e.g. meters) using ARCore?

In Java ARCore world units are meters (I just realized we might not document this... aaaand looks like nope. Oops, bug filed). By subtracting the translation component of two Poses you can get the distance between them. Your code would look something like this:
On first hit as hitResult:
startAnchor = session.addAnchor(hitResult.getHitPose());
On second hit as hitResult:
Pose startPose = startAnchor.getPose();
Pose endPose = hitResult.getHitPose();
// Clean up the anchor
session.removeAnchors(Collections.singleton(startAnchor));
startAnchor = null;
// Compute the difference vector between the two hit locations.
float dx = startPose.tx() - endPose.tx();
float dy = startPose.ty() - endPose.ty();
float dz = startPose.tz() - endPose.tz();
// Compute the straight-line distance.
float distanceMeters = (float) Math.sqrt(dx*dx + dy*dy + dz*dz);
Assuming that these hit results don't happen on the same frame, creating an Anchor is important because the virtual world can be reshaped every time you call Session.update(). By holding that location with an anchor instead of just a Pose, its Pose will update to track the physical feature across those reshapings.

You can extract the two HitResult poses using getHitPose() and then compare their translation component (getTranslation()).
The translation is defined as
...the position vector from the destination (usually
world) coordinate frame to the local coordinate frame, expressed in
destination (world) coordinates.
As for the physical unit of this I could not find any remark. With a calibrated camera this should be mathematically possible but I don't know if they actually provide an API for this

The answer is: Yes, you definitely can calculate distance between two HitResult's. The working units of ARCore, as well as ARKit, are meters. Sometimes, it's more useful to use centimetres. Here are a few ways how you do it with Java and great old Pythagorean theorem.
import com.google.ar.core.HitResult
MotionEvent tap = queuedSingleTaps.poll();
if (tap != null && camera.getTrackingState() == TrackingState.TRACKING) {
for (HitResult hit : frame.hitTest(tap)) {
// some logic...
}
}
// Here's the principle how you can calculate the distance
// between two anchors in 3D space using Java:
private double getDistanceMeters(Pose pose0, Pose pose1) {
float distanceX = pose0.tx() - pose1.tx();
float distanceY = pose0.ty() - pose1.ty();
float distanceZ = pose0.tz() - pose1.tz();
return Math.sqrt(distanceX * distanceX +
distanceY * distanceY +
distanceZ * distanceZ);
}
// Convert Meters into Centimetres
double distanceCm = ((int)(getDistanceMeters(pose0, pose1) * 1000))/10.0f;
// pose0 is the location of first Anchor
// pose1 is the location of second Anchor
Alternatively, you can use the following math:
Pose pose0 = firstAnchor.getPose() // first pose
Pose pose1 = secondAnchor.getPose() // second pose
double distanceM = Math.sqrt(Math.pow((pose0.tx() - pose1.tx()), 2) +
Math.pow((pose0.ty() - pose1.ty()), 2) +
Math.pow((pose0.tz() - pose1.tz()), 2));
double distanceCm = ((int)(distanceM * 1000))/10.0f;

Drawing a circle in a turtle program

I am currently working on a Processing (as in the language) sketch, which is driven by Turtle logic (see https://en.wikipedia.org/wiki/Turtle_graphics). This means that I draw a line from the current coordinate to a supplied coordinate. This supplied coordinate will then become the new current coordinate. I want to approximate a circle and have written a simple piece of code using trigonometrics. The code looks as follow:
void drawCircle(int radius){
// The circle center is radius amount to the left of the current xpos
int steps = 16;
double step = TWO_PI /steps;
for(double theta = step; theta <= TWO_PI; theta += step){
float deltaX = cos((float)theta) - cos((float)(theta - step));
float deltaY = sin((float)theta) - sin((float)(theta - step));
moveXY(deltaX*radius, deltaY*radius);
}
}
The program logic is simple. It will use the variable theta to loop through all the radians in a circle. The amount of steps will indicate how large each theta chunk is. It will then calculate the x,y values for the specific point in the circle governed by theta. It will then deduct the x,y values of the previous cycle (hence the theta-step) to get the amount it will have to move from this position to attain the desired x,y position. It will finally supply those delta values to a moveXY function, which draws a line from the current point to the supplied values and makes them the new current position.
The program seems to work quite well when using a limited amount of steps. However, when the step count is increased, the circles become more and more like a Fibonacci spiral. My guess is that this is due to imprecision with the float number and the sine and cosine calculations, and that this adds up with each iteration.
Have I interpreted something wrong? I am looking to port this code to Javascript eventually, so I am looking for a solution in the design. Using BigDecimal might not work, especially since it does not contain its own cosine and sine functions. I have included a few images to detail the problem. Any help is much appreciated!
Step count 16:
Step count 32:
Step count 64:
Step count 128:

Float and sine/cosine should be sufficiently precise. The question is: How precise is your position on the plane? If this position is measured in pixels, then each of your floating point values is rounded to integer after each step. The loss of precision then adds up.

At each iteration round the loop, you are calculating the delta without regard of what the current coordinate is. So effectively, you are "dead-reckoning", which is always going to be inaccurate, since errors at each step build up.
Since you know that you want a circle, an alternative approach would be at each iteration, to first determine the actual point on the circle you want to get to, and then calculate the delta to get there - so something like the following (but I must admit I haven't tested it !):
void drawCircle(int radius){
// The circle center is radius amount to the left of the current xpos
int steps = 16;
double step = TWO_PI /steps;
float previousX = 0;
float previousY = radius;
for(double theta = step; theta <= TWO_PI; theta += step){
float thisX = radius * sin((float)theta);
float thisY = radius * cos((float)theta);
moveXY(thisX - previousX, thisY - previousY);
previousX = thisX;
previousY = thisY;
}
}

Car Steering - OpenGL

I'm creating a racing car game and am having trouble figuring out how to get steering to work. I have a basic race 2D race course that is built in a 3D environment. The program only uses x and y, with z being 0.
My race course consists of a road that is 29 units wide in the x-axis and two long tracks that are 120 units long in the y direction. At 120 units in the y-axis there is a 180 degree turn. You can think of the course as looking similar to a nascar styled race course.
I'm trying to set my car's steering so that it can turn realistically when I reach the 180 degree turns. I'm using two variables that separately control the x / y positions, as well as two variables for the x / y velocities. My code at the moment is as follows:
public void steering(){
double degreesPerFrame = 180 / (2*30); //180 degrees over 30 fps in 2s
velocityX = velocityX * -1 * Math.cos(degreesPerFrame);
velocityY = velocityY * -1 * Math.sin(degreesPerFrame);
double yChange = Math.sin(degreesPerFrame) * velocityY;
double xChange = Math.cos(degreesPerFrame) * velocityX;
x += xChange; //change x position
y += yChange; //change y position
}
I'm not completely sure how I can get my steering to turn properly. I'm stuck at the moment and not sure what I would need to change in my function to get steering working properly.

I think this would be easier if you don't use angles at all for you calculations. Simply operate with position, velocity, and acceleration vectors.
First, a quick reminder from physics 101. With a small time step dt, current position p, and current velocity vector v, and an acceleration vector a, the new position p' and velocity vector v' are:
v' = v + a * dt
p' = p + v' * dt
The acceleration a depends on the driver input. For example:
When moving ahead at a constant speed, it is 0.
When accelerating, it is limited by engine power and tire grip in longitudinal direction.
When turning, it is limited by tire grip in lateral direction.
When braking, it is limited by tire grip (mostly, the brakes are typically strong enough to not be a limit).
For a relatively simple model, you can assume that grip in longitudinal and lateral direction are the same (which is not entirely true in reality), which limits the total acceleration to a vector within a circle, which is commonly called the friction circle.
That was mainly just background. Back to the more specific case here, which is turning at a steady speed. The result of steering the car can be modeled by a lateral force, and corresponding acceleration. So in this case, we can apply an acceleration vector that is orthogonal to the current velocity vector.
With the components of the velocity vector:
v = (vx, vy)
a vector that points orthogonally to the left (you mentioned NASCAR, so there's only left turns...) of this is:
(-vy, vx)
Since we want to control the amount of lateral acceleration, which is the length of the acceleration vector, we normalize this vector, and multiply it by the maximum acceleration:
a = aMax * normalize(-vy, vx)
If you use real units for your calculations, you can apply a realistic number for the maximum lateral acceleration aMax. Otherwise, just tweak it to give you the desired maneuverability for the car in your artificial units.
That's really all you need. Recapping the steps in code form:
// Realistic value for sports car on street tires when using
// metric units. 10 m/s^2 is slightly over 1 g.
static const float MAX_ACCELERATION = 10.0f;
float deltaTime = ...; // Time since last update.
float accelerationX = -veclocityY;
float accelerationY = velocityX;
float accelerationScale = MAX_ACCELERATION /
sqrt(accelerationX * accelerationX + accelerationY * accelerationY);
accelerationX *= accelerationScale;
acceleration *= accelerationScale;
velocityX += accelerationX * deltaTime;
velocityY += accelerationY * deltaTime;
x += velocityX * deltaTime;
y += velocityY * deltaTime;

Gradient Descent Linear Regression in Java

This a bit of a long shot, but I wonder if someone could look at this. Am I doing Batch Gradient descent for linear regression correctly here?
It gives the expected answers for a single independent and intercept, but not for multiple independent variables.
/**
* (using Colt Matrix library)
* #param alpha Learning Rate
* #param thetas Current Thetas
* #param independent
* #param dependent
* #return new Thetas
*/
public DoubleMatrix1D descent(double alpha,
DoubleMatrix1D thetas,
DoubleMatrix2D independent,
DoubleMatrix1D dependent ) {
Algebra algebra = new Algebra();
// ALPHA*(1/M) in one.
double modifier = alpha / (double)independent.rows();
//I think this can just skip the transpose of theta.
//This is the result of every Xi run through the theta (hypothesis fn)
//So each Xj feature is multiplied by its Theata, to get the results of the hypothesis
DoubleMatrix1D hypothesies = algebra.mult( independent, thetas );
//hypothesis - Y
//Now we have for each Xi, the difference between predictect by the hypothesis and the actual Yi
hypothesies.assign(dependent, Functions.minus);
//Transpose Examples(MxN) to NxM so we can matrix multiply by hypothesis Nx1
DoubleMatrix2D transposed = algebra.transpose(independent);
DoubleMatrix1D deltas = algebra.mult(transposed, hypothesies );
// Scale the deltas by 1/m and learning rate alhpa. (alpha/m)
deltas.assign(Functions.mult(modifier));
//Theta = Theta - Deltas
thetas.assign( deltas, Functions.minus );
return( thetas );
}

There is nothing wrong in your implementation and based on your comment the problem in collinearity which you induce when generating x2. This is problematic in regression estimation.
To test your algorithm, you can generate two independent columns of random numbers. Pick a value of w0, w1 and w2 i.e. coefficients for intercept, x1 and x2 respectively. Calculate the dependent value y.
Then see if your stochastic/batch gradient decent algorithm can recover w0, w1 and w2 values

I think adding
// ALPHA*(1/M) in one.
double modifier = alpha / (double)independent.rows();
Is a bad Idea, since you're mixing gradient function with the gradient descent algorithm, it's much better to have a gradientDescent algorithm inside a public method like following in Java:
import org.la4j.Matrix;
import org.la4j.Vector;
public Vector gradientDescent(Matrix x, Matrix y, int kmax, double alpha)
{
int k=1;
Vector thetas = Vector.fromArray(new double[] { 0.0, 0.0});
while (k<kmax)
{
thetas = thetas.subtract(gradient(x, y, thetas).multiply(alpha));
k++;
}
return thetas;
}

Minimising cumulative floating point arithmetic error

I have a 2D convex polygon in 3D space and a function to measure the area of the polygon.
public double area() {
if (vertices.size() >= 3) {
double area = 0;
Vector3 origin = vertices.get(0);
Vector3 prev = vertices.get(1).clone();
prev.sub(origin);
for (int i = 2; i < vertices.size(); i++) {
Vector3 current = vertices.get(i).clone();
current.sub(origin);
Vector3 cross = prev.cross(current);
area += cross.magnitude();
prev = current;
}
area /= 2;
return area;
} else {
return 0;
}
}
To test that this method works at all orientations of the polygon I had my program rotate it a little bit each iteration and calculate the area. Like so...
Face f = poly.getFaces().get(0);
for (int i = 0; i < f.size(); i++) {
Vector3 v = f.getVertex(i);
v.rotate(0.1f, 0.2f, 0.3f);
}
if (blah % 1000 == 0)
System.out.println(blah + ":\t" + f.area());
My method seems correct when testing with a 20x20 square. However the rotate method (a method in the Vector3 class) seems to introduce some error into the position of each vertex in the polygon, which affects the area calculation. Here is the Vector3.rotate() method
public void rotate(double xAngle, double yAngle, double zAngle) {
double oldY = y;
double oldZ = z;
y = oldY * Math.cos(xAngle) - oldZ * Math.sin(xAngle);
z = oldY * Math.sin(xAngle) + oldZ * Math.cos(xAngle);
oldZ = z;
double oldX = x;
z = oldZ * Math.cos(yAngle) - oldX * Math.sin(yAngle);
x = oldZ * Math.sin(yAngle) + oldX * Math.cos(yAngle);
oldX = x;
oldY = y;
x = oldX * Math.cos(zAngle) - oldY * Math.sin(zAngle);
y = oldX * Math.sin(zAngle) + oldY * Math.cos(zAngle);
}
Here is the output for my program in the format "iteration: area":
0: 400.0
1000: 399.9999999999981
2000: 399.99999999999744
3000: 399.9999999999959
4000: 399.9999999999924
5000: 399.9999999999912
6000: 399.99999999999187
7000: 399.9999999999892
8000: 399.9999999999868
9000: 399.99999999998664
10000: 399.99999999998386
11000: 399.99999999998283
12000: 399.99999999998215
13000: 399.9999999999805
14000: 399.99999999998016
15000: 399.99999999997897
16000: 399.9999999999782
17000: 399.99999999997715
18000: 399.99999999997726
19000: 399.9999999999769
20000: 399.99999999997584
Since this is intended to eventually be for a physics engine I would like to know how I can minimise the cumulative error since the Vector3.rotate() method will be used on a very regular basis.
Thanks!
A couple of odd notes:
The error is proportional to the amount rotated. ie. bigger rotation per iteration -> bigger error per iteration.
There is more error when passing doubles to the rotate function than when passing it floats.

You'll always have some cumulative error with repeated floating point trig operations — that's just how they work. To deal with it, you basically have two options:
Just ignore it. Note that, in your example, after 20,000 iterations(!) the area is still accurate down to 13 decimal places. That's not bad, considering that doubles can only store about 16 decimal places to begin with.
Indeed, plotting your graph, the area of your square seems to be going down more or less linearly:
This makes sense, assuming that the effective determinant of your approximate rotation matrix is about 1 − 3.417825 × 10-18, which is well within normal double precision floating point error range of one. If that's the case, the area of your square would continue a very slow exponential decay towards zero, such that you'd need about two billion (2 × 109) 7.3 × 1014 iterations to get the area down to 399. Assuming 100 iterations per second, that's about seven and a half months 230 thousand years.
Edit: When I first calculated how long it would take for the area to reach 399, it seems I made a mistake and somehow managed to overestimate the decay rate by a factor of about 400,000(!). I've corrected the mistake above.
If you still feel you don't want any cumulative error, the answer is simple: don't iterate floating point rotations. Instead, have your object store its current orientation in a member variable, and use that information to always rotate the object from its original orientation to its current one.
This is simple in 2D, since you just have to store an angle. In 3D, I'd suggest storing either a quaternion or a matrix, and occasionally rescaling it so that its norm / determinant stays approximately one (and, if you're using a matrix to represent the orientation of a rigid body, that it remains approximately orthogonal).
Of course, this approach won't eliminate cumulative error in the orientation of the object, but the rescaling does ensure that the volume, area and/or shape of the object won't be affected.

You say there is cumulative error but I don't believe there is (note how your output desn't always go down) and the rest of the error is just due to rounding and loss of precision in a float.
I did work on a 2d physics engine in university (in java) and found double to be more precise (of course it is see oracles datatype sizes
In short you will never get rid of this behaviour you just have to accept the limitations of precision
EDIT:
Now I look at your .area function there is possibly some cumulative due to
+= cross.magnitude
but I have to say that whole function looks a bit odd. Why does it need to know the previous vertices to calculate the current area?

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.