What does theta values of gradient descent mean? - java

I have all the components, I just am not quite sure This is my output:
Theta-->: 0.09604203456288299, 1.1864676227195392
How do I interpret that? What does it mean?
I essentially just modified the example from this description. But I'm not sure if it's really applicable to my problem. I'm trying to perform binary classification on a set of documents. The documents are rendered as bag-of-words style feature vectors of the form:
Example:
Document 1 = ["I", "am", "awesome"]
Document 2 = ["I", "am", "great", "great"]
Dictionary is:
["I", "am", "awesome", "great"]
So the documents as a vector would look like:
Document 1 = [1, 1, 1, 0]
Document 2 = [1, 1, 0, 2]
This is my gradient descent code:
public static double [] gradientDescent(final double [] theta_in, final double alpha, final int num_iters, double[][] data )
{
final double m = data.length;
double [] theta = theta_in;
double theta0 = 0;
double theta1 = 0;
for (int i = 0; i < num_iters; i++)
{
final double sum0 = gradientDescentSumScalar0(theta, alpha, data );
final double sum1 = gradientDescentSumScalar1(theta, alpha, data);
theta0 = theta[0] - ( (alpha / m) * sum0 );
theta1 = theta[1] - ( (alpha / m) * sum1 );
theta = new double [] { theta0, theta1 };
}
return theta;
}
//data is the feature vector
//this theta is weight
protected static double [] matrixMultipleHthetaByX( final double [] theta, double[][] data )
{
final double [] vector = new double[ data.length ];
int i = 0;
for (final double [] d : data)
{
vector[i] = (1.0 * theta[0]) + (d[0] * theta[1]);
i++;
} // End of the for //
return vector;
}
protected static double gradientDescentSumScalar0(final double [] theta, final double alpha, double[][] data )
{
double sum = 0;
int i = 0;
final double [] hthetaByXArr = matrixMultipleHthetaByX(theta, data );
for (final double [] d : data)
{
final double X = 1.0;
final double y = d[1];
final double hthetaByX = hthetaByXArr[i];
sum = sum + ( (hthetaByX - y) * X );
i++;
} // End of the for //
return sum;
}
protected static double gradientDescentSumScalar1(final double [] theta, final double alpha, double[][] data )
{
double sum = 0;
int i = 0;
final double [] hthetaByXArr = matrixMultipleHthetaByX(theta, data );
for (final double [] d : data)
{
final double X = d[0];
final double y = d[1];
final double hthetaByX = hthetaByXArr[i];
sum = sum + ( (hthetaByX - y) * X );
i++;
} // End of the for //
return sum;
}
public static double [] batchGradientDescent( double [] weights, double[][] data )
{
/*
* From tex:
* \theta_j := \theta_j - \alpha\frac{1}{m} \sum_{i=1}^m ( h_\theta (x^{(i)})
*/
final double [] theta_in = weights;
double [] theta = gradientDescent(theta_in, alpha, iterations, data );
lastTheta = theta;
System.out.println("Theta-->: " + theta[0] + ", " + theta[1]);
return theta;
}
I call it like this:
final int globoDictSize = globoDict.size(); // number of features
double[] weights = new double[globoDictSize + 1];
for (int i = 0; i < weights.length; i++)
{
//weights[i] = Math.floor(Math.random() * 10000) / 10000;
//weights[i] = randomNumber(0,1);
weights[i] = 0.0;
}
int inputSize = trainingPerceptronInput.size();
double[] outputs = new double[inputSize];
final double[][] a = Prcptrn_InitOutpt.initializeOutput(trainingPerceptronInput, globoDictSize, outputs, LABEL);
for (int p = 0; p < inputSize; p++)
{
Gradient_Descent.batchGradientDescent( weights, a );
}
How can I verify that this code is doing what I want? Shouldn't it be outputting a predicted label or something? I've heard I can also apply to it an error function, such as hinge loss, that would come after the call to batch gradient descent as a seperate component, isn't it?

You code is complicated (I used to implement batch gradient descent in Octave, not in OO programming languages). But as far as I see in your code (and it is a common to use this notation) Theta is a parameter vector. After grad descend algorithm converges it returns you optimal Theta vector. After that you could claculate output of your new example with formula:
theta_transposed * X,
where theta_trasponsed is a transposed vector of theta, X is a vector of input features.
On a side note, the example you have referred to is a regression task (it is about linear regression). While the task you describe is a classification problem, where instead of predicting some value (some number - weight, length, smth else) you need to assign a label to input set. It can be completed with lots of different algorithms, but defenetily not with linear regression which is described in article you posted.
I also need to mentioned that it is absolutely not clear what kind of classification you try to perform. In your exmaple you have a bag of words description (matrixes of word counts). But where are classificaiton labels? Is it multi-output classification? Or just multi-class? Or binary?
I really suggest you to take a course on ml. Maybe on coursera. This one is good:
https://www.coursera.org/course/ml
It also covers full implementaion of gradient descent.

Related

Calculate R-Square for PolynomialCurveFitter in Apache commons-math3

Apache commons-math3 (version 3.6.1) classes like OLSMultipleLinearRegression, SimpleRegression provide a method that calculates RSquare (i.e calculateRSquared(), getRSquare() respectively). But I am not able to find any such method for PolynomialCurveFitter ?
Right now I am doing it myself like below. Is there any such method in common-math which does this?
private PolynomialFunction getPolynomialFitter(List<List<Double>> pointlist) {
final PolynomialCurveFitter fitter = PolynomialCurveFitter.create(2);
final WeightedObservedPoints obs = new WeightedObservedPoints();
for (List<Double> point : pointlist) {
obs.add(point.get(0), point.get(1));
}
double[] fit = fitter.fit(obs.toList());
System.out.printf("\nCoefficient %f, %f, %f", fit[0], fit[1], fit[2]);
final PolynomialFunction fitted = new PolynomialFunction(fit);
return fitted;
}
private double getRSquare(PolynomialFunction fitter, List<List<Double>> pointList) {
final double[] coefficients = fitter.getCoefficients();
double[] predictedValues = new double[pointList.size()];
double residualSumOfSquares = 0;
final DescriptiveStatistics descriptiveStatistics = new DescriptiveStatistics();
for (int i=0; i< pointList.size(); i++) {
predictedValues[i] = predict(coefficients, pointList.get(i).get(0));
double actualVal = pointList.get(i).get(1);
double t = Math.pow((predictedValues[i] - actualVal), 2);
residualSumOfSquares += t;
descriptiveStatistics.addValue(actualVal);
}
final double avgActualValues = descriptiveStatistics.getMean();
double totalSumOfSquares = 0;
for (int i=0; i<pointList.size(); i++) {
totalSumOfSquares += Math.pow( (predictedValues[i] - avgActualValues),2);
}
return 1.0 - (residualSumOfSquares/totalSumOfSquares);
}
final PolynomialFunction polynomial = getPolynomialFitter(trainData);
System.out.printf("\nPolynimailCurveFitter R-Square %f", getRSquare(polynomial, trainData));
This has been answered in apache-commons mailing list. Cross-posting the answer
OLSMultipleLinearRegression, SimpleRegression provide a method that
returns calculateRSquared(),
getRSquare(). But I am not able to find any such method for
PolynomialCurveFitter ?
Right now I am doing it myself like below :-
Is there any such method in common-math which does this?
"PolynomialCurveFitter" is one of the syntactic sugar/wrapper
around the least-squares optimizers.
No state is maintained in the (immutable) instance.
private PolynomialFunction getPolynomialFitter(List<List<Double>>pointlist) {
final PolynomialCurveFitter fitter = PolynomialCurveFitter.create(2);
final WeightedObservedPoints obs = new WeightedObservedPoints();
for (List<Double> point : pointlist) {
obs.add(point.get(0), point.get(1));
}
double[] fit = fitter.fit(obs.toList());
System.out.printf("\nCoefficient %f, %f, %f", fit[0], fit[1], fit[2]);
final PolynomialFunction fitted = new PolynomialFunction(fit);
return fitted;
}
This is indeed one the intended use-cases.
private double getRSquare(PolynomialFunction fitter, List<List<Double>> pointList) {
final double[] coefficients = fitter.getCoefficients();
double[] predictedValues = new double[pointList.size()];
double residualSumOfSquares = 0;
final DescriptiveStatistics descriptiveStatistics = new DescriptiveStatistics();
for (int i=0; i< pointList.size(); i++) {
predictedValues[i] = predict(coefficients, pointList.get(i).get(0));
double actualVal = pointList.get(i).get(1);
double t = Math.pow((predictedValues[i] - actualVal), 2);
residualSumOfSquares += t;
descriptiveStatistics.addValue(actualVal);
}
final double avgActualValues = descriptiveStatistics.getMean();
double totalSumOfSquares = 0;
for (int i=0; i<pointList.size(); i++) {
totalSumOfSquares += Math.pow( (predictedValues[i] - avgActualValues),2);
}
return 1.0 - (residualSumOfSquares/totalSumOfSquares);
}
The "predict" method is not shown here, but note that the argument
which you called "fitter" in the above, is actually a polynomial
function:
http://commons.apache.org/proper/commons-math/apidocs/org/apache/commons/math4/analysis/polynomials/PolynomialFunction.html
Hence:
predictedValues[i] = fitter.value(pointList.get(i).get(0));
But otherwise, yes, the caller is responsible for choosing his
assessement of the quality of the model.
You could directly use the least-squares suite of classes; then
the "Evaluation" object would allow to retrieve various measures
of the fit:
http://commons.apache.org/proper/commons-math/apidocs/org/apache/commons/math4/fitting/leastsquares/LeastSquaresProblem.Evaluation.html
However, they might still not be what you are looking for...

use multi-threading to generate a matrix in java to utilize all CPU cores of a supercomputer

i am working on a problem which deals with large amount of data and computation also for all that purpose we have a supercomputer with processing power of 270 T FLOPS so our data is basically in matrix form so we decided to divide the generation of matrix into several parts by using threads so problem is there only that how can we implement such thing in our function we are just using arguments to divide task but run function of thread is not taking arguments
static int start=0,end;
int row=10000;
int col =10000,count=0,m=2203,n=401;
double p=Math.PI,t=Math.sqrt(3),pi=(p/(col+1));
double mul,a,b,c,d,e,f,xx,yy;
int[][] matrix = new int [row][col];
private void sin1()
{
// TODO Auto-generated method
for (int i =start; i < row; i++)
{
for (int j = 0; j <col; j++)
{
xx=((i+1)*pi);
yy=((j+1)*pi);
a=Math.cos(((2*m)-n)*((2*(xx))/3));
b=Math.sin(((2*(n*(yy)))/t));
c=Math.cos(((((2*n)-m)*(2*(xx)))/3));
d=Math.sin(((2*m)*(yy)/t));
e=Math.cos((((m+n)*(2*(xx)))/3));
f=Math.sin((((m-n)*(2*(yy)))/t));
mul=(a*b)-(c*d)+(e*f);
if(mul<0)
{
matrix[i][j]=0;
}
else
{
matrix[i][j]=1;
}
System.out.print(matrix[i][j]);
}
System.out.println();
}
}
we at first testing it for 10 million values
The code makes it clear to me that you lack any form of Java programming knowledge. This is a bad thing if you want to write code for a super computer. Java luckily has a good set of tools to solve all kinds of problems, but you need to know which tools to use for which situation.
In your case you can i.E. use parallel streams to spread the generation across cores like this:
static final int start = 0;
static int end;
static final int row = 10000;
static final int col = 10000, count = 0, m = 2203, n = 401;
static final double t = Math.sqrt(3);
static final double pi = (Math.PI / (col + 1));
final int[][] matrix = new int[row][col];
public int generateMatrixEntry(final int i, final int j) {
final double xx = ((i + 1) * pi);
final double yy = ((j + 1) * pi);
final double a = Math.cos(((2 * m) - n) * ((2 * (xx)) / 3));
final double b = Math.sin(((2 * (n * (yy))) / t));
final double c = Math.cos(((((2 * n) - m) * (2 * (xx))) / 3));
final double d = Math.sin(((2 * m) * (yy) / t));
final double e = Math.cos((((m + n) * (2 * (xx))) / 3));
final double f = Math.sin((((m - n) * (2 * (yy))) / t));
final double mul = (a * b) - (c * d) + (e * f);
return (mul < 0) ? 0 : 1;
}
private void sin1() {
IntStream.range(start, row).parallel().forEach((i) -> {
for (int j = 0; j < col; j++) {
matrix[i][j] = generateMatrixEntry(i, j);
}
});
}
This is however just one possible solution that might or might not fit your hardware. You absolutely need someone with deeper Java knowledge to select the right tools from the set for you if the above does not solve your issues.

Gradient descent in Java

I've recently started the AI-Class at Coursera and I've a question related to my implementation of the gradient descent algorithm.
Here's my current implementation (I actually just "translated" the mathematical expressions into Java code):
public class GradientDescent {
private static final double TOLERANCE = 1E-11;
private double theta0;
private double theta1;
public double getTheta0() {
return theta0;
}
public double getTheta1() {
return theta1;
}
public GradientDescent(double theta0, double theta1) {
this.theta0 = theta0;
this.theta1 = theta1;
}
public double getHypothesisResult(double x){
return theta0 + theta1*x;
}
private double getResult(double[][] trainingData, boolean enableFactor){
double result = 0;
for (int i = 0; i < trainingData.length; i++) {
result = (getHypothesisResult(trainingData[i][0]) - trainingData[i][1]);
if (enableFactor) result = result*trainingData[i][0];
}
return result;
}
public void train(double learningRate, double[][] trainingData){
int iteration = 0;
double delta0, delta1;
do{
iteration++;
System.out.println("SUBS: " + (learningRate*((double) 1/trainingData.length))*getResult(trainingData, false));
double temp0 = theta0 - learningRate*(((double) 1/trainingData.length)*getResult(trainingData, false));
double temp1 = theta1 - learningRate*(((double) 1/trainingData.length)*getResult(trainingData, true));
delta0 = theta0-temp0; delta1 = theta1-temp1;
theta0 = temp0; theta1 = temp1;
}while((Math.abs(delta0) + Math.abs(delta1)) > TOLERANCE);
System.out.println(iteration);
}
}
The code works quite well but only if I choose an very little alpha, here called learningRate. If it's higher than 0.00001, it diverges.
Do you have any suggestions on how to optimize the implementation, or an explanation for the "Alpha-Issue" and a possible solution for it?
Update:
Here's the main including some sample inputs:
private static final double[][] TDATA = {{200, 20000},{300, 41000},{900, 141000},{800, 41000},{400, 51000},{500, 61500}};
public static void main(String[] args) {
GradientDescent gd = new GradientDescent(0,0);
gd.train(0.00001, TDATA);
System.out.println("THETA0: " + gd.getTheta0() + " - THETA1: " + gd.getTheta1());
System.out.println("PREDICTION: " + gd.getHypothesisResult(300));
}
The mathematical expression of gradient descent is as follows:
To solve this issue, it's necessary to normalize the data with this formular: (Xi-mu)/s.
Xi is the current training set value, mu the average of values in the current column and s the maximum value minus the minimum value of the current column. This formula will get the training data approximately into a range between -1 and 1 which allowes to choose higher learning rates and gradient descent to converge faster.
But it's afterwards necessary to denormalize the predicted result.
private double getResult(double[][] trainingData, boolean enableFactor){
double result = 0;
for (int i = 0; i < trainingData.length; i++) {
result = (getHypothesisResult(trainingData[i][0]) - trainingData[i][1]);
if (enableFactor) result = result*trainingData[i][0];
}
return result;
In this func. result variable overwritten each iteration, the old value being lost. When inputing the values only the last item on array is calculating. Rest of them dont matter.
You should use java.math.BigDecimal for your arithematic operations.
double has its rounding-off issues while performing any arithematic.

Normalization issue in Java - Android Studio

I am gathering 10 acceleration values from an on-board accelerometer on a mobile device. I am then attempting to normalize these values between the range of -1,1. I am unable to figure out why this isn't working correctly.
Here is the normalization code:
class NormUtil {
private double dataHigh;
private double dataLow;
private double normalizedHigh;
private double normalizedLow;
public NormUtil(double dataHigh, double dataLow) {
this(dataHigh, dataLow, 1, -1);
}
public NormUtil(double dataHigh, double dataLow, double normalizedHigh, double normalizedLow) {
this.dataHigh = dataHigh;
this.dataLow = dataLow;
this.normalizedHigh = normalizedHigh;
this.normalizedLow = normalizedLow;
}
public double normalize(double e) {
return ((e - dataLow)
/ (dataHigh - dataLow))
* (normalizedHigh - normalizedLow) + normalizedLow;
}
On a button press, the highest/lowest acceleration values are found in this code:
A = enrolAcc.get(0);
B = enrolAcc.get(0);
for (Float i: enrolAcc) {
if(i < A) A = i;
if(i > B) B = i;
}
Once the highest/lowest values are found a NormUtil instance is created and this instance is used to normalize the array of acceleration values and then add the normalized values to a new array:
NormUtil norm = new NormUtil(B,A,1,-1);
for(int j = 0; j < enrolAcc.size(); j++) {
double start = enrolAcc.get(j);
double x = norm.normalize(start);
nAcc[j] = x;
}
This nAcc array is then put into a string array and then a single string to display in a text view. The issue is the text view is always initialized with the original non-normalized acceleration values, here is the code I use for that:
String normD[] = new String[10];
for (int i = 0; i < 10; i++) {
normD[i] = String.valueOf(nAcc[i]);
}
StringBuilder strBuilder2 = new StringBuilder();
for (int i = 0; i<normD.length; i++) {
strBuilder2.append(normD[i] + ",");
}
normData = strBuilder.toString();
textNorm.setText("Normalised: " + normData);
So my question is, where am I going wrong with adding the normalized values to the normalized array and is this normalization method correct for what I am trying to achieve? Thanks in advance.

converting c++ DTW code to java

I want to translate a code from C++ to Java. The original code implements fast DTW algorithm. The piece of code I couldn't figure out was I attribute I'm not sure what it does hence, I can't convert it.
The error in Java is in statements l_buff+I & u_buff+I because the plus operator is not supported between int I & double[] l_buff,u_buff.
I have included all statements that involves I
int I;
for(i=0; i<ep; i++)
{
/// A bunch of data has been read and pick one of them at a time to use
d = buffer[i];
/// Calculate sum and sum square
ex += d;
ex2 += d*d;
/// t is a circular array for keeping current data
t[i%m] = d;
/// Double the size for avoiding using modulo "%" operator
t[(i%m)+m] = d;
/// Start the task when there are more than m-1 points in the current chunk
if( i >= m-1 )
{
mean = ex/m;
std = ex2/m;
std = Math.sqrt(std-mean*mean);
/// compute the start location of the data in the current circular array, t
j = (i+1)%m;
/// the start location of the data in the current chunk
I = i-(m-1);
lb_k2 = lb_keogh_data_cumulative(order, tz, qo, cb2, l_buff+I, u_buff+I, m, mean, std, bsf);
and the lb_data_cumlative method implementation is
public static double lb_keogh_data_cumulative(int[] order, double []tz, double []qo, double []cb, double []l, double []u, int len, double mean, double std, double best_so_far )
{
double lb = 0;
double uu,ll,d;
for (int i = 0; i < len && lb < best_so_far; i++)
{
uu = (u[order[i]]-mean)/std;
ll = (l[order[i]]-mean)/std;
d = 0;
if (qo[i] > uu)
d = dist(qo[i], uu);
else
{
if(qo[i] < ll)
d = dist(qo[i], ll);
}
lb += d;
cb[order[i]] = d;
}
return lb;
}
here is the paper on which the code relies SIGKDD TRILLION
l_buff+I and u_buff+I mean that you shift the beginning of the arrays to I elements. The lb_keogh_data_cumulative parameters l and u won't see the first I elements of the given arrays.
So you could write something like
lb_k2 = lb_keogh_data_cumulative(order, tz, qo, cb2, Arrays.copyOfRange(l_buff, I, l_buff.length), Arrays.copyOfRange(u_buff, I, u_buff.length), m, mean, std, bsf);
The arrays are not modified by the called method so you can pass a copy.

Categories

Resources