Implementation of Temporal Difference Learning in Java

Implementation of Temporal Difference Learning in Java - java

The below code is my implementation of temporal difference learning. The agent who uses the TD algorithm plays more than 750,000 games against an agent that uses mini-max procedure to play the game, But the problem is the TD-agent does not learn... What is wrong with this implementation?
updateToNextState is called when the agent choose a next move.
public void updateToNextState(int[] currentState, double[] nextStateOutput) {
double[] outputOfNext = nextStateOutput;
double[] outputOfCurrent = getOutput(currentState);
double[] error = getDifferenceOfOutputs(outputOfNext, outputOfCurrent);
lastHandledState = currentState;
for (int j = 0; j < layers[HIDDEN].neurons.length; j++) {
for (int k = 0; k < layers[OUTPUT].neurons.length; k++) {
double toBeUpdatedValueForJToK = BETA * error[k]
* eligibilityTraces.getEjk(j, k);
layers[HIDDEN].neurons[j].updateWeightToNeuron(
layers[OUTPUT].neurons[k].getNeuronId(),
toBeUpdatedValueForJToK);
for (int i = 0; i < layers[INPUT].neurons.length; i++) {
double toBeUpdatedValueForIToJ = ALPHA * error[k]
* eligibilityTraces.getEijk(i, j, k);
layers[INPUT].neurons[i].updateWeightToNeuron(
layers[HIDDEN].neurons[j].getNeuronId(),
toBeUpdatedValueForIToJ);
}
}
}
updateEligibilityTraces(currentState);
}
private void updateEligibilityTraces(int[] currentState) {
// to ensure that the values in neurons are originated from current
// state
feedForward(currentState);
for (int j = 0; j < layers[HIDDEN].neurons.length; j++) {
for (int k = 0; k < layers[OUTPUT].neurons.length; k++) {
double toBeUpdatedValueForJK = gradient(layers[OUTPUT].neurons[k])
* layers[HIDDEN].neurons[j].output;
eligibilityTraces.updateEjk(j, k, toBeUpdatedValueForJK);
for (int i = 0; i < layers[INPUT].neurons.length; i++) {
double toBeUpdatedValueForIJK = gradient(layers[OUTPUT].neurons[k])
* gradient(layers[HIDDEN].neurons[j])
* layers[INPUT].neurons[i].output
* layers[HIDDEN].neurons[j]
.getWeightToNeuron(layers[OUTPUT].neurons[k]
.getNeuronId());
eligibilityTraces.updateEijk(i, j, k,
toBeUpdatedValueForIJK);
}
}
}
}
private double gradient(Neuron neuron) {
return neuron.output * (1 - neuron.output);
}
public void updateToNextWhenOpponentEndsGame(double[] outputOfEndState) {
updateToNextState(lastHandledState, outputOfEndState);
}
private double[] getDifferenceOfOutputs(double[] outputNext,
double[] outputCurrent) {
double[] differencesVector = new double[outputNext.length];
for (int i = 0; i < outputNext.length; i++) {
double difference = outputNext[i] - outputCurrent[i];
differencesVector[i] = difference;
}
return differencesVector;
}
I have used this link as guide line. I have tried different values for ALPHA & BETA, amount of hidden neurons. Eligibility traces are initialized to 0.

The problem mainly is that you cannot tune your neural network function approximator, and from what you said I can assume that "it does not learn", means that the algorithm does not converge.
This happens when we use using TD and NN together. And this happened to me previously, and I searched for it for a long time. The lesson that I have learned is as follows:
According to Richard Sutton: Do not try to use Neural Networks as function approximators, together with TD methods, unless you know exactly how to tune your neural network. Otherwise, this will cause lots of problems.
To know more about it find Sutton's talk on youtube.

Related

"Encoding" algorithm giving different result in separate method

So i set out to make a little "encoding" program that uses a simple algorithm and so far it all works. i came up with the algorithm, and then found the inverse of it to "decode" the given String.
How it works is that in command line you do "java Diver lock message password". It then takes the ascii values and runs it through the algorithm Z_n = (X_n + Y_n) / 2, giving you an "encoded" string that can then be used in the program arguments on start as "java Driver unlock code password". It takes these values and runs them through X_n = 2Z_n - y_n.
These algorithms work when simply using the lock portion, and i've put the same process at the end of lock that happens in the unlocking process, yet when trying only the locking process, the output is incorrect.
Here's a little snippet as to how i believe it is working
lock Oliver Chipper
unlock ĤƨƤǌƪƮä Chipper
x = message, y = password, z = code
x & y = z
z & y = x
I have a feeling that it has to do with the command line not taking in the symbols that are used as output, but a thorough explanation of what i've done would be great... Thank you!
public class Driver {
private static int[] x; //Message or code
private static int[] y; //Password
public static void main(String[] args) {
int SIZE = args[1].length() + args[2].length();
if (args[0].equals("lock")) {
lock(args, SIZE);
} else if (args[0].equals("unlock")) {
unlock(args, SIZE);
}
}
private static void lock(String[] args, int size) {
x = new int[size];
for (int i = 0; i < args[1].length(); i++) {
x[i] = args[1].charAt(i); //Message to ints
}
y = new int[size];
for (int i = 0; i < args[2].length(); i++) {
y[i] = args[2].charAt(i); //Password to ints
}
//code
int[] z = new int[size];
for (int i = 0; i < size; i++) {
z[i] = ((x[i] + y[i]) * 2); //Locking algorithm
System.out.print((char)z[i]);
}
System.out.println("\n");
for (int i = 0; i < size; i++) {
System.out.print((char)((z[i] / 2) - y[i])); //Unlocking algorithm
}
}
private static void unlock(String[] args, int size) {
x = new int[size];
for (int i = 0; i < args[1].length(); i++) {
x[i] = args[1].charAt(i); //Code to ints
}
y = new int[size];
for (int i = 0; i < args[2].length(); i++) {
y[i] = args[2].charAt(i); //Password to ints
}
for (int i = 0; i < size; i++) {
System.out.print((char)((x[i] / 2) - y[i])); //Unlocking algorithm
}
}}

Neural Networks DataSet learning

for a while now, i I am writing my own neural network for recognizing digits. It works perfectly fine for one given input and one expected output. It's getting close to the values until the total error is arround around 0.00001 or something like that. But obviously i need my network to learn more then one pattern. I've written my own class DataSet which stores inputs and desired outputs. My question now is: How do i get my program to learn every single pattern from my set. For now i am doing it like this: just learning every pattern one by one and hoping that the total error is getting better. But in my net with (784 = 28*28) input neurons, 15 hidden neurons and 10 output neurons and only 3 patterns, why total error is arround 0.4 It doesnt match the target at all so i want to ask you what i can do.
My code below:
public void CalculateSignalErrors(Matrix1d in, Matrix1d exp) {
int i, j, k, OutputLayer;
double Sum;
this.calculate(in, false);
for (i = 0; i < this.OUTPUT_SIZE; i++) {
signalErrors[this.NETWORK_SIZE - 1].set(i,
(this.outputs[this.NETWORK_SIZE - 1].get(i) - exp.get(i))
* this.derivatives[this.NETWORK_SIZE - 1].get(i));
}
for(i = this.NETWORK_SIZE - 2; i > 0; i--){
for(j = 0; j < outputs[i].X; j ++){
Sum = 0;
for(k = 0; k < outputs[i+1].X; k++){
Sum = Sum + weights[i+1].get(k, j) *
signalErrors[i+1].get(k);
}
signalErrors[i].set(j,derivatives[i].get(j) * Sum);
}
}
}
public void backpropagateError(double eta) {
int i,j,k;
for(i = this.NETWORK_SIZE-1; i > 0; i--){
for(j = 0; j < outputs[i].X; j++){
for(k = 0; k < outputs[i-1].X; k++){
this.weights[i].set(j, k,this.weights[i].get(j, k) + (-eta * this.signalErrors[i].get(j) * this.outputs[i-1].get(k)));
}
this.biases[i].set(j, this.biases[i].get(j) - eta * this.signalErrors[i].get(j));
}
}
}
public void train(Matrix1d in, Matrix1d exp, double eta){
this.CalculateSignalErrors(in, exp);
this.backpropagateError(eta);
}
and my training for datasets:
public void train(TrainSet set, double epochs, double eta, boolean printIt){
for(int e = 0; e < epochs; e ++){
TrainSetIterator it = set.iterator();
while(it.hasNext()){
Matrix1d[] v = it.next();
this.train(v[0], v[1], eta);
}
if(printIt){
//System.out.format("%-9s %-7s %-15s%n", "Epoch:", e , outputError(set));
System.out.println(outputError(set));
}
}
}
My error calculations:
public double outputError(Matrix1d input, Matrix1d expected) {
Matrix1d out = this.calculate(input, false);
expected = expected.clone();
out.sub(expected);
return (out.length() * out.length() * 0.5);
}
public double outputError(TrainSet set){
TrainSetIterator it = set.iterator();
double e = 0;
while(it.hasNext()){
Matrix1d[] o = it.next();
e += outputError(o[0], o[1]);
}
return (e / (double)(set.size()));
}
Also it's important to know that while i feed my data forward, i'm writing my derivatives directly into the neurons (incase you wonder what derivative[x].get(y) means. (x = layer) (y = neuron)

Java multi-thread matrix multiplication

Trying to get a multi-threaded matrix multiplication to work in Java. It is given a (m x n) matrix, a (n x k) matrix and 't' threads to perform the operation on.
My program works when the matrices are square and t == n. When running with t < n, the other threads do not pick up the additional operations, and it returns a partially completed matrix. When the matrices are not square, the additional threads return array out of bounds errors and do not run. I would really appreciate any advice. Here are the relevant code snippets
Beginning threads. multipliers is an array of MatrixMultiplier, a class defined later.
Multiply multiply = new Multiply(cols_mat, rows_mat2);
for (int i = 0; i < threads; i++) {
multipliers[i] = new MatrixMultiplier(multiply);
}
for (int i = 0; i < threads; i++) {
my_threads[i] = new Thread(multipliers[i]);
}
for (int i = 0; i < threads; i++) {
my_threads[i].start();
}
for (int i = 0; i < threads; i++) {
my_threads[i].join();
}
Multiply class which defines the matrix multiplication
class Multiply extends MatrixMultiplication {
private int i;
private int j;
private int chance;
public Multiply(int i, int j) {
this.i = i;
this.j = j;
chance = 0;
}
public synchronized void multiplyMatrix() {
int sum = 0;
int a = 0;
for (a = 0; a < i; a++) {
sum = 0;
for (int b = 0; b < j; b++) {
sum = sum + mat[chance][b] * mat2[b][a];
}
result[chance][a] = sum;
}
if (chance >= i)
return;
chance++;
}
}
And the matrix multiplier
class MatrixMultiplier implements Runnable {
private final Multiply mul;
public MatrixMultiplier(Multiply mul) {
this.mul = mul;
}
#Override
public void run() {
mul.multiplyMatrix();
}
}
Where I personally think the issue lies is with if (chance >= i) return; but I have not found a way to incorporate a thread's column responsibilities with the program still working. Again, any advice pointing me in the right direction would be greatly appreciated.

There are several issues with your code.
The t threads assume that only t multiplications are required to produce your result matrix. This is not to be the case when m != k or t != m or t != k. The threads are worker threads that will only process your requests. I would consider making each MatrixMultiplier have access to the mxn, nxk, mxk matrices and a rolcolumn entries container.
class MatricMultiplier {
private double a[][], b[][], results[][];
private Queue<..> entries;
....
}
The run method will then use the entries container to calculate the sum for a given <row,column> entry of the resulting mxk matrix. The run method could become:
run() {
for (Entry entry = entries.poll(); entry != null; entry = entries.poll()) {
int row = entry.row;
int col = entry.col;
double sum = 0.0;
for (int i = 0; i < a[row].length; i++) {
sum += a[row][i] * b[i][col];
}
results[row][col] = sum;
}
}
There are three things to note here that is different than what you have.
you are not using a synchronization block
each entry is calculating the answer for a unique row/column of the result matrix
the Multiple class is not required any longer
You can then create t threads that process each entry in the entries container and will exit when the entries container is empty.
Note that the entries container should be one of the concurrent Queue containers available in the java.util.concurrent package.
The remaining task is how to create the rowcolumn entries container. Here is some code that you could use:
Queue<..> entries = new Concurrent...<..>();
int rowSize = a.length;
int colSize = b[0].length;
for (int row = 0; row < rowSize; row++) {
for (int col = 0; col < colSize; col++) {
entries.add(new RowColumnEntry(row, col));
}
}
Noting that the a and b are the m×n and n×k matrices.
Hope this helps.

Different shortest path for each test

I'm writing a program counting average delay in given network. I use a JUNG library. On input my program reads information how many packets vertex x want to send to vertex y for second. My graph is unweighted and I assume that packets are sending by shortest path.
I use JUNG methods to get shortest path:
public class NetworkGraph {
protected final Graph graph;
protected Vertex[] vertices;
protected Random random;
protected double sumOfFlowStrengthMatrix;
protected final int[][] flowStrengthMatrix;
NetworkGraph(Input input, Graph graph) {
random = new Random();
this.graph = graph;
loadVertices(input);
loadEdges(input);
loadSumOfFlowStrengthMatrix(input);
flowStrengthMatrix = input.getFlowStrengthMatrix();
}
private void loadVertices(Input input) {
vertices = new Vertex[input.getNumberOfVertices()];
for (int i = 0; i < input.getNumberOfVertices(); i++) {
vertices[i] = new Vertex(i + 1);
graph.addVertex(vertices[i]);
}
}
private void loadEdges(Input input) {
for (int i = 0; i < input.getNumberOfVertices(); i++) {
for (int j = 0; j < input.getNumberOfVertices(); j++) {
if (input.getProbabilityOfDivulsionArray()[i][j] != 0) {
if (graph.findEdge(vertices[i], vertices[j]) == null) {
graph.addEdge(new Edge(input.getCapacityArray()[i][j], input.getProbabilityOfDivulsionArray()[i][j]), vertices[i], vertices[j]);
}
}
}
}
}
private void loadSumOfFlowStrengthMatrix(Input input) {
double sum = 0;
for (int i = 0; i < input.getNumberOfVertices(); i++) {
for (int j = 0; j < input.getNumberOfVertices(); j++) {
sum += input.getFlowStrengthMatrix()[i][j];
}
}
this.sumOfFlowStrengthMatrix = sum;
}
public double countAveragePacketDelayInNetwork() throws EdgeException {
double out = 0;
ArrayList<Edge> edges = new ArrayList<>(graph.getEdges());
recountFlows();
for (Edge e : edges) {
out += e.getAveragePacketDelay();
}
return round((out / sumOfFlowStrengthMatrix), 4);
}
protected void recountFlows() {
for (int i = 0; i < vertices.length; i++) {
for (int j = 0; j < vertices.length; j++) {
DijkstraShortestPath<Vertex, Edge> algorithm = new DijkstraShortestPath<>(graph);
List<Edge> edges = algorithm.getPath(vertices[i], vertices[j]);
for (Edge edge : edges) {
edge.addToFlowStrength(flowStrengthMatrix[i][j]);
}
}
}
}
}
I ran my program several times with the same sample graph. Unfortunately I got a different results - for each time I have different average delay - it's really annoying.
Probably it's caused by Dijkstra algorithm - I noticed that Dijkstra algorithm returns different results for the same input. I know that it can be many shortest path from x to y, but why Dijkstra algorithm returns different paths when the input and way of creating graph is exactly the same every time?
Is there any way to make this algorithm returns always the same shortest path for given x and y?

As Tony_craft said, we lack enough information about how the graph is built that a definitive answer is not possible.
However, there are two basic reasons why you might be getting different paths each time:
(1) The graph is not the same each time.
(2) The edges are being iterated over in a different order and you're getting a different edge of the same weight. Order of iteration over a Set (of outgoing edges) is not guaranteed to be consistent.

Hidden Markov Model, Clarification on a Previous Implementation

I'm experimenting with hidden markov models. I really have no prior experience working with them so I decided to check out a few examples of implementations.
Looking at the below implementation, I was a bit confused about the purpose of the Baum-Welch algorithm (found under the train method) taking a variable steps. I understand providing a training set, but not providing steps. Does anyone have an explanation for this, since I don't understand it from the documentation.
Here's a link to the original code http://cs.nyu.edu/courses/spring04/G22.2591-001/BW%20demo/HMM.java since the code isn't being presented very nicely in my post.
import java.text.*;
/** This class implements a Hidden Markov Model, as well as
the Baum-Welch Algorithm for training HMMs.
#author Holger Wunsch (wunsch#sfs.nphil.uni-tuebingen.de)
*/
public class HMM {
/** number of states */
public int numStates;
/** size of output vocabulary */
public int sigmaSize;
/** initial state probabilities */
public double pi[];
/** transition probabilities */
public double a[][];
/** emission probabilities */
public double b[][];
/** initializes an HMM.
#param numStates number of states
#param sigmaSize size of output vocabulary
*/
public HMM(int numStates, int sigmaSize) {
this.numStates = numStates;
this.sigmaSize = sigmaSize;
pi = new double[numStates];
a = new double[numStates][numStates];
b = new double[numStates][sigmaSize];
}
/** implementation of the Baum-Welch Algorithm for HMMs.
#param o the training set
#param steps the number of steps
*/
public void train(int[] o, int steps) {
int T = o.length;
double[][] fwd;
double[][] bwd;
double pi1[] = new double[numStates];
double a1[][] = new double[numStates][numStates];
double b1[][] = new double[numStates][sigmaSize];
for (int s = 0; s < steps; s++) {
/* calculation of Forward- und Backward Variables from the
current model */
fwd = forwardProc(o);
bwd = backwardProc(o);
/* re-estimation of initial state probabilities */
for (int i = 0; i < numStates; i++)
pi1[i] = gamma(i, 0, o, fwd, bwd);
/* re-estimation of transition probabilities */
for (int i = 0; i < numStates; i++) {
for (int j = 0; j < numStates; j++) {
double num = 0;
double denom = 0;
for (int t = 0; t <= T - 1; t++) {
num += p(t, i, j, o, fwd, bwd);
denom += gamma(i, t, o, fwd, bwd);
}
a1[i][j] = divide(num, denom);
}
}
/* re-estimation of emission probabilities */
for (int i = 0; i < numStates; i++) {
for (int k = 0; k < sigmaSize; k++) {
double num = 0;
double denom = 0;
for (int t = 0; t <= T - 1; t++) {
double g = gamma(i, t, o, fwd, bwd);
num += g * (k == o[t] ? 1 : 0);
denom += g;
}
b1[i][k] = divide(num, denom);
}
}
pi = pi1;
a = a1;
b = b1;
}
}
/** calculation of Forward-Variables f(i,t) for state i at time
t for output sequence O with the current HMM parameters
#param o the output sequence O
#return an array f(i,t) over states and times, containing
the Forward-variables.
*/
public double[][] forwardProc(int[] o) {
int T = o.length;
double[][] fwd = new double[numStates][T];
/* initialization (time 0) */
for (int i = 0; i < numStates; i++)
fwd[i][0] = pi[i] * b[i][o[0]];
/* induction */
for (int t = 0; t <= T-2; t++) {
for (int j = 0; j < numStates; j++) {
fwd[j][t+1] = 0;
for (int i = 0; i < numStates; i++)
fwd[j][t+1] += (fwd[i][t] * a[i][j]);
fwd[j][t+1] *= b[j][o[t+1]];
}
}
return fwd;
}
The other two questions I have are in regards to the Forward method, which implements the Forward part of the Forward-Backward algorithm. From reading up on HMMs, I gather that, after training my model, I should use something like this method to predict future observations. So is the param O (representing output sequences) just the sequence of observations up until this point?
With some experimentation with this method, I'm returned what the documentation says are Forward-Variables, which just look like a bunch of probabilities. How are these translated into future observations?
I'm delving into regions of programming that are quite difficult for me, so I really appreciate your help in helping me understand this stuff!

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Implementation of Temporal Difference Learning in Java - java

Related

"Encoding" algorithm giving different result in separate method

Neural Networks DataSet learning

Java multi-thread matrix multiplication

Different shortest path for each test

Hidden Markov Model, Clarification on a Previous Implementation

Categories

Resources