incremental conjugate gradient algorithm - java

I wrote this piece of code that should make what is described here:
Conjugate Gradient from wikipedia
but after some iterations the variable denomAlpha goes to zero and so I get a NAN on alpha. So what is wrong with my algorithm?
import Jama.Matrix;
public class ConjugateGrad {
private static final int MAX_IT = 20;
private static final int MAX_SIZE = 50;
public static void main(String[] args) {
Matrix A = Matrix.random(MAX_SIZE, MAX_SIZE);
Matrix b = Matrix.random(MAX_SIZE, 1);
double[][] d = new double[MAX_SIZE][1];
for(int ii=0;ii<MAX_SIZE;ii++) {
d[ii][0] =0;
}
Matrix x = Matrix.constructWithCopy(d);
Matrix r = b.minus(A.times(x));
Matrix p = r;
Matrix rTrasp_r = r.transpose().times(p);
for (int i = 0; i < MAX_IT; i++) {
Matrix denomAlpha = p.transpose().times(A.times(p));
double numeratorAlpha = rTrasp_r.getArray()[0][0];
double Alpha = numeratorAlpha / denomAlpha.getArray()[0][0];
x = x.plus(p.times(Alpha));
r = r.minus(A.times(p));
Matrix rNew = r.transpose().times(r);
if (Math.sqrt(rNew.getArray()[0][0]) <1.0e-6) {
break;
}
double Beta = rNew.getArray()[0][0] / rTrasp_r.getArray()[0][0];
p = r.plus(p.times(Beta));
rTrasp_r = rNew;
}
}
}
it same that with those parameters :
double[][] matrixA = {{4,1},{1,3}};
Matrix A = Matrix.constructWithCopy(matrixA);
double[][] vectorb = {{1},{2}};
Matrix b = Matrix.constructWithCopy(vectorb);
double[][] d = {{2},{1}};
Matrix x = Matrix.constructWithCopy(d);
at first step of the algorithm things are good
but at second step not...
r: -8.0, -3.0
Alpha: 0.22054380664652568
Beta: 12.67123287671233
x: 0.2356495468277946, 0.33836858006042303,
Second step :
Alpha: 0.0337280177221555
Beta: 159.11259655226627
x: -2.2726985108925097, -0.47156587291133856,
Ok, I have found one Error:
r = r.minus(A.times(p).times(Alpha));
Now it works:
r: -8.0, -3.0,
Alpha: 0.22054380664652568
rNew: 0.6403099643121183,
Beta: 0.008771369374138607
p: -0.3511377223647101, 0.7229306048685207,
x: 0.2356495468277946, 0.33836858006042303,

Sorry for the hack answer but... using the numerical example params from the Wikipedia article and outputting the matrices to terminal at each step could find the discrepancy.

Related

Get X-Coordinate of Elliptical Curve using Bouncy Castle

I tried to calculate Tr(x) operation for x coordinate of the elliptical curve F2m (m = 163). For that, I used "Bouncy Castle" with corresponding types. Trace for my elliptical curve is equal to either 0 or 1 and my code is the following:
public int CalculateTrace_Test(byte[] array)
{
int m = 163;
BigInteger two = new BigInteger("2", 10);
BigInteger x = new BigInteger(array);
BigInteger xi = x;
BigInteger temp = x;
for (int i = 1; i < m; i++)
{
var next = xi.ModPow(two.Pow(i), fx);
temp = temp.Xor(next);
}
return temp.IntValue;
}
Here fx is an integer formed from the irreducible polynomial f(x) = x^163+x^7+x^6+x^3 + 1.
So my problem that it doesn't work and as result, I have everything but not 1 or 0. Could anyone please tell me what is wrong in my implementation of the trace?
It doesn't look like you are properly doing field arithmetic in GF(2m). The classes that support correct field arithmetic are in the package org.bouncycastle.math.ec. Take a look at ECFieldElement.F2m and ECCurve.F2m. Also, for your specific case which corresponds to the SECT163 reduction polynomial, the class SecT163FieldElement may be particularly useful.
Here some code copied directly from the class org.bouncycastle.math.ec.tools.TraceOptimizer. The code assumes the the finite field is of characteristic 2.
private static int calculateTrace(ECFieldElement fe) {
int m = fe.getFieldSize();
ECFieldElement tr = fe;
for (int i = 1; i < m; ++i) {
fe = fe.square();
tr = tr.add(fe);
}
BigInteger b = tr.toBigInteger();
if (b.bitLength() > 1) {
throw new IllegalStateException();
}
return b.intValue();

java OpenCV find k nearest neighbor c++ to java conversion

I am trying to find the k nearest neighbors with the Knn classifier in OpenCV.
I found this C++ Code:
class atsKNN{
public :
void knn(cv::Mat& trainingData, cv::Mat& trainingClasses, cv::Mat& testData, cv::Mat& testClasses, int K)
{
cv::KNearest knn(trainingData, trainingClasses, cv::Mat(), false, K);
cv::Mat predicted(testClasses.rows, 1, CV_32F);
for(int i = 0; i < testData.rows; i++) {
const cv::Mat sample = testData.row(i);
predicted.at<float>(i,0) = knn.find_nearest(sample, K);
}
float percentage = evaluate(predicted, testClasses) * 100;
cout << "K Nearest Neighbor Evaluated Accuracy = " << percentage << "%" << endl;
prediction = predicted;
}
void showplot(cv::Mat testData)
{
plot_binary(testData, prediction, "Predictions Backpropagation");
}
private:
cv::Mat prediction;
};
The comments mention it works really good but i am having problems Converting it to Java. There is no Documentation for Java. I tried using a C++ to Java Converter but the resulting Code does not work.
here is the code it produced:
public class atsKNN
{
public final void knn(cv.Mat trainingData, cv.Mat trainingClasses, cv.Mat testData, cv.Mat testClasses, int K)
{
cv.KNearest knn = new cv.KNearest(trainingData, trainingClasses, cv.Mat(), false, K);
cv.Mat predicted = new cv.Mat(testClasses.rows, 1, CV_32F);
for (int i = 0; i < testData.rows; i++)
{
final cv.Mat sample = testData.row(i);
predicted.<Float>at(i,0) = knn.find_nearest(sample, K);
}
float percentage = evaluate(predicted, testClasses) * 100;
System.out.print("K Nearest Neighbor Evaluated Accuracy = ");
System.out.print(percentage);
System.out.print("%");
System.out.print("\n");
prediction = predicted;
}
public final void showplot(cv.Mat testData)
{
plot_binary(testData, prediction, "Predictions Backpropagation");
}
private cv.Mat prediction = new cv.Mat();
}
edit:
The line predicted.at(i,0) = knn.find_nearest(sample, K); has most definitely to be wrong.
There is now function at in object Mat.
Also there is no "evaluate function".
Another thing is where does the prediction Mat belong to?In java you can not just put it in the end of the class.
Thanks=)
The following code is for finding the digits
here's some code to try:
import org.opencv.core.*;
import org.opencv.imgproc.*;
import org.opencv.imgcodecs.Imgcodecs;
import org.opencv.ml.*;
import org.opencv.utils.*;
import java.util.*;
class SimpleSample {
static{ System.loadLibrary(Core.NATIVE_LIBRARY_NAME); }
public static void main(String[] args) {
// samples/data/digits.png, have a look at it.
Mat digits = Imgcodecs.imread("digits.png", 0);
// setup train/test data:
Mat trainData = new Mat(),
testData = new Mat();
List<Integer> trainLabs = new ArrayList<Integer>(),
testLabs = new ArrayList<Integer>();
// 10 digits a 5 rows:
for (int r=0; r<50; r++) {
// 100 digits per row:
for (int c=0; c<100; c++) {
// crop out 1 digit:
Mat num = digits.submat(new Rect(c*20,r*20,20,20));
// we need float data for knn:
num.convertTo(num, CvType.CV_32F);
// 50/50 train/test split:
if (c % 2 == 0) {
// for opencv ml, each feature has to be a single row:
trainData.push_back(num.reshape(1,1));
// add a label for that feature (the digit number):
trainLabs.add(r/5);
} else {
testData.push_back(num.reshape(1,1));
testLabs.add(r/5);
}
}
}
// make a Mat of the train labels, and train knn:
KNearest knn = KNearest.create();
knn.train(trainData, Ml.ROW_SAMPLE, Converters.vector_int_to_Mat(trainLabs));
// now test predictions:
for (int i=0; i<testData.rows(); i++)
{
Mat one_feature = testData.row(i);
int testLabel = testLabs.get(i);
Mat res = new Mat();
float p = knn.findNearest(one_feature, 1, res);
System.out.println(testLabel + " " + p + " " + res.dump());
}
//// hmm, the 'real world' test case probably looks more like this:
//// make sure, you follow the very same preprocessing steps used in the train phase:
// Mat one_feature = Imgcodecs.imread("one_digit.png", 0);
// Mat feature; one_feature.convertTo(feature, CvTypes.CV_32F);
// Imgproc.resize(feature, feature, Size(20,20));
// int predicted = knn.findNearest(feature.reshape(1,1), 1);
}
}

What does theta values of gradient descent mean?

I have all the components, I just am not quite sure This is my output:
Theta-->: 0.09604203456288299, 1.1864676227195392
How do I interpret that? What does it mean?
I essentially just modified the example from this description. But I'm not sure if it's really applicable to my problem. I'm trying to perform binary classification on a set of documents. The documents are rendered as bag-of-words style feature vectors of the form:
Example:
Document 1 = ["I", "am", "awesome"]
Document 2 = ["I", "am", "great", "great"]
Dictionary is:
["I", "am", "awesome", "great"]
So the documents as a vector would look like:
Document 1 = [1, 1, 1, 0]
Document 2 = [1, 1, 0, 2]
This is my gradient descent code:
public static double [] gradientDescent(final double [] theta_in, final double alpha, final int num_iters, double[][] data )
{
final double m = data.length;
double [] theta = theta_in;
double theta0 = 0;
double theta1 = 0;
for (int i = 0; i < num_iters; i++)
{
final double sum0 = gradientDescentSumScalar0(theta, alpha, data );
final double sum1 = gradientDescentSumScalar1(theta, alpha, data);
theta0 = theta[0] - ( (alpha / m) * sum0 );
theta1 = theta[1] - ( (alpha / m) * sum1 );
theta = new double [] { theta0, theta1 };
}
return theta;
}
//data is the feature vector
//this theta is weight
protected static double [] matrixMultipleHthetaByX( final double [] theta, double[][] data )
{
final double [] vector = new double[ data.length ];
int i = 0;
for (final double [] d : data)
{
vector[i] = (1.0 * theta[0]) + (d[0] * theta[1]);
i++;
} // End of the for //
return vector;
}
protected static double gradientDescentSumScalar0(final double [] theta, final double alpha, double[][] data )
{
double sum = 0;
int i = 0;
final double [] hthetaByXArr = matrixMultipleHthetaByX(theta, data );
for (final double [] d : data)
{
final double X = 1.0;
final double y = d[1];
final double hthetaByX = hthetaByXArr[i];
sum = sum + ( (hthetaByX - y) * X );
i++;
} // End of the for //
return sum;
}
protected static double gradientDescentSumScalar1(final double [] theta, final double alpha, double[][] data )
{
double sum = 0;
int i = 0;
final double [] hthetaByXArr = matrixMultipleHthetaByX(theta, data );
for (final double [] d : data)
{
final double X = d[0];
final double y = d[1];
final double hthetaByX = hthetaByXArr[i];
sum = sum + ( (hthetaByX - y) * X );
i++;
} // End of the for //
return sum;
}
public static double [] batchGradientDescent( double [] weights, double[][] data )
{
/*
* From tex:
* \theta_j := \theta_j - \alpha\frac{1}{m} \sum_{i=1}^m ( h_\theta (x^{(i)})
*/
final double [] theta_in = weights;
double [] theta = gradientDescent(theta_in, alpha, iterations, data );
lastTheta = theta;
System.out.println("Theta-->: " + theta[0] + ", " + theta[1]);
return theta;
}
I call it like this:
final int globoDictSize = globoDict.size(); // number of features
double[] weights = new double[globoDictSize + 1];
for (int i = 0; i < weights.length; i++)
{
//weights[i] = Math.floor(Math.random() * 10000) / 10000;
//weights[i] = randomNumber(0,1);
weights[i] = 0.0;
}
int inputSize = trainingPerceptronInput.size();
double[] outputs = new double[inputSize];
final double[][] a = Prcptrn_InitOutpt.initializeOutput(trainingPerceptronInput, globoDictSize, outputs, LABEL);
for (int p = 0; p < inputSize; p++)
{
Gradient_Descent.batchGradientDescent( weights, a );
}
How can I verify that this code is doing what I want? Shouldn't it be outputting a predicted label or something? I've heard I can also apply to it an error function, such as hinge loss, that would come after the call to batch gradient descent as a seperate component, isn't it?
You code is complicated (I used to implement batch gradient descent in Octave, not in OO programming languages). But as far as I see in your code (and it is a common to use this notation) Theta is a parameter vector. After grad descend algorithm converges it returns you optimal Theta vector. After that you could claculate output of your new example with formula:
theta_transposed * X,
where theta_trasponsed is a transposed vector of theta, X is a vector of input features.
On a side note, the example you have referred to is a regression task (it is about linear regression). While the task you describe is a classification problem, where instead of predicting some value (some number - weight, length, smth else) you need to assign a label to input set. It can be completed with lots of different algorithms, but defenetily not with linear regression which is described in article you posted.
I also need to mentioned that it is absolutely not clear what kind of classification you try to perform. In your exmaple you have a bag of words description (matrixes of word counts). But where are classificaiton labels? Is it multi-output classification? Or just multi-class? Or binary?
I really suggest you to take a course on ml. Maybe on coursera. This one is good:
https://www.coursera.org/course/ml
It also covers full implementaion of gradient descent.

Binary Int Array to ASCII Equivalent in Java

I searched around here for awhile and nothing seems to help me with my assignment. I'm trying to convert an int array that has binary code in it manually, do the decimal conversion and then cast it as a char to get the ascii equivalent. I have something started, but when I print it out, I get -591207182 as the message, which obviously isn't correct. I have my program down below. I'm fairly novice at writing and understanding Java, so the most efficient and easy to understand route would be much appreciated.
class DecodeMessage
{
public void getBinary(Picture secretImage)
{
Pixel pixelObject = null;
Color pixelColor = null;
int [] binaryInt = new int[secretImage.getWidth()];
int x = 0;
int redValue = 0;
while(redValue < 2)
{
Pixel pixelTarget = new Pixel(secretImage,x,0);
pixelColor = pixelTarget.getColor();
redValue = pixelColor.getRed();
binaryInt[x] = redValue;
x++;
}
}
public void decodeBinary(int [] binary)
{
int binaryLen = binary.length;
long totVal = 0;
int newVal = 0;
int bitVal = 0;
long preVal = 0;
long base = 2;
for(int x = binaryLen - 1; x >= 0; x--)
{
bitVal = binary[x];
preVal = bitVal * base;
totVal += preVal;
base = base * 2;
}
System.out.println(totVal);
}
}
public class DecodeMessageTester
{
public static void main(String[] args)
{
Picture pictureObj = new Picture("SecretMessage.bmp");
pictureObj.explore();
DecodeMessage decode = new DecodeMessage();
decode.getBinary(pictureObj);
int[] bitArray = {0,1,1,0,0,0,1,0,0,1,1,0,1,0,0,1,0,1,1,0,1,1,1,0,0,1,1,0,0,0,0,1,0,1,1,1,0,0,1,0,0,1,1,1,1,0,0,1};
decode.decodeBinary(bitArray);
}
}
Your problem is that you're trying to squeeze all 48 bits into one int. However, as http://docs.oracle.com/javase/tutorial/java/nutsandbolts/datatypes.html states, an int in Java can only hold 32 bits, so your numbers overflow. Try changing base, preVal and totVal to long, which can hold 64 bits.
Of course, if you need more than 64 bits (or 63 actually because the last bit is the sign bit), you will not be able to use a primitive number datatype to hold that.

How to correctly export Weight and Bias value of Backpropagation neural network into another programming language (Java)

I created backpropagation Neural Network using Matlab. I tried to implement XOR gate using Matlab, then getting its weight and bias to create neural network in java. Network consist of 2 input neuron, 2 hidden layer each using 2 neuron and 1 output neuron. After train network, i got following weight and bias :
clear;
clc;
i = [0 0 1 1; 0 1 0 1];
o = [0 1 1 0];
net = newff(i,o,{2,2},{'tansig','logsig','purelin'});
net.IW{1,1} = [
-5.5187 -5.4490;
3.7332 2.7697
];
net.LW{2,1} = [
-2.8093 -3.0692;
-1.6685 6.7527
];
net.LW{3,2} = [
-4.9318 -0.9651
];
net.b{1,1} = [
2.1369;
2.6529
];
net.b{2,1} = [
-0.2274;
-4.9512
];
net.b{3,1} = [
1.4848
];
input = net.IW{1,1};
layer = net.LW{2,1};
output = net.LW{3,2};
biasinput = net.b{1,1};
biaslayer = net.b{2,1};
biasoutput= net.b{3,1};
a = sim(net,i);
a;
I simulate it using 1 and 1 as input got following result :
>> f = [1;1]
f =
1
1
>> sim(net,f)
ans =
-0.1639
Then I tried to make simple java code to count this neural network. My code :
public class Xor {
//Value of neuron
static double[] neuroninput = new double[2];
static double[] neuronhidden1 = new double[2];
static double[] neuronhidden2 = new double[2];
static double[] neuronoutput = new double[2];
//Weight variable init
//For first hidden layer
static double[] weighthidden11 = new double[2];
static double[] weighthidden12 = new double[2];
//for second hidden layer
static double[] weighthidden21 = new double[2];
static double[] weighthidden22 = new double[2];
//for output layer
static double[] weightoutput = new double[2];
//End of weight variable init
//Bias value input
static double[] biashidden1 = new double[2];
static double[] biashidden2 = new double[2];
static double[] biasoutput = new double[1];
public static void main(String[] args) {
neuroninput[0] = 1;
neuroninput[1] = 1;
weighthidden11[0] = -5.5187;
weighthidden11[1] = -5.4490;
weighthidden12[0] = 3.7332;
weighthidden12[1] = 2.7697;
weighthidden21[0] = -2.8093;
weighthidden21[1] = -3.0692;
weighthidden22[0] = -1.6685;
weighthidden22[1] = 6.7527;
weightoutput[0] = -4.9318;
weightoutput[1] = -0.9651;
biashidden1[0] = 2.1369;
biashidden1[1] = 2.6529;
biashidden2[0] = -0.2274;
biashidden2[1] = -4.9512;
biasoutput[0] = 1.4848;
//Counting each neuron (Feed forward)
neuronhidden1[0] = sigma(neuroninput,weighthidden11,biashidden1[0]);
neuronhidden1[0] = tansig(neuronhidden1[0]);
neuronhidden1[1] = sigma(neuroninput,weighthidden12,biashidden1[1]);
neuronhidden1[1] = tansig(neuronhidden1[1]);
neuronhidden2[0] = sigma(neuronhidden1,weighthidden21,biashidden2[0]);
neuronhidden2[0] = logsig(neuronhidden2[0]);
neuronhidden2[1] = sigma(neuronhidden1,weighthidden22,biashidden2[1]);
neuronhidden2[1] = logsig(neuronhidden2[1]);
neuronoutput[0] = sigma(neuronhidden2,weightoutput,biasoutput[0]);
neuronoutput[0] = purelin(neuronoutput[0]);
System.out.println(neuronoutput[0]);
}
static double tansig(double x) {
double value = 0;
value = (Math.exp(x) - Math.exp(-x)) / (Math.exp(x) + Math.exp(-x));
return value;
}
static double logsig(double x) {
double value = 0;
value = 1 / (1+Math.exp(-x));
return value;
}
static double purelin(double x) {
double value = x;
return value;
}
static double sigma(double[] val, double[] weight, double hidden) {
double value = 0;
for (int i = 0; i < val.length; i++) {
value += (val[i] * weight[i]);
//System.out.println(val[i]);
}
value += hidden;
return value;
}
}
But it got result as following :
-1.3278721528152158
My question, is there any error or my mistake in exporting weight and bias value from matlab to java? Maybe I made mistake in my java program?
Thank you verymuch..
I think the problem is the normalization:
http://www.mathworks.com/matlabcentral/answers/14590
If you work with 0,1 inputs, you have to use the f(x)=2*x-1 normalization function, which transforms the values to the [-1; 1] interval, then g(x)=(x+1)/2 to transform back the output to [0; 1]. Pseudocode:
g( java_net( f(x), f(y) ) ) = matlab_net(x, y)
I tried this with an other network and worked for me.
Your problem is most surely related to your JAVA version of the Matlab sim() command.
It is a complex Matlab command with many settings affecting the architecture of the network to be simulated. To make debugging easier, try and implement the sim() command yourself in Matlab. Possibly reduce the number of layers until you have a match in Matlab between sim()-builtin and your own sim version. When that is working convert to JAVA.
EDIT:
The Reason for re-implementing the sim() function in Matlab, is that if you can't implement it here, you won't be able to properly implement it in JAVA either. Feed forward networks are quiet easy to implement using Matlab vector notation.

Categories

Resources