Improving minimax algorithm for Gomoku AI with transposition table? - java

I'm building an AI for Gomoku (16x16) with minimax and alpha-beta pruning but it's very slow. So far, I have tried pre-sorting the order of moves and instead of deep copying the board, adding and later removing the moves. Also, I use an arraylist of relevant moves(which are within a radius of 2 to already placed pieces) to reduce the search board. Yet the AI still struggles even at a depth search of 3.
Edit: I have found out about something called transposition table, but I don't know where to start. Any help would be great!
private double minimax(Board node, String player, int depth, double lowerBound, double upperBound){
if (depth==3){
return node.evaluate();
}
if (player.equals(humanPiece)) {// min node
// sort setup
ArrayList<int[]> relevantMoves = node.relevantMoves();
HashMap<int[], Double> moveValueTable = new HashMap<>();
for (int[] move: relevantMoves){
node.addMove(move[0], move[1], player);
double val = node.evaluate();
moveValueTable.put(move, val);
node.retractMove(move[0], move[1]);
}
// insertion sort from small to big (alpha-beta optimization)
insertionSort(relevantMoves, moveValueTable);
result = Double.POSITIVE_INFINITY;
// minimax
for (int[] move : relevantMoves) { // y first, x second
node.addMove(move[0], move[1], player);
double score = minimax(node, node.getEnemy(player), depth+1, lowerBound, upperBound);
node.retractMove(move[0], move[1]);
if (score < upperBound) {
upperBound = score;
}
if (score < result) result = score;
if (lowerBound > upperBound) {
break;
}
}
return result;
}
else{// max node
// sort setup
ArrayList<int[]> relevantMoves = node.relevantMoves();
HashMap<int[], Double> moveValueTable = new HashMap<>();
for (int[] move: relevantMoves){
node.addMove(move[0], move[1], player);
double val = node.evaluate();
moveValueTable.put(move, val);
node.retractMove(move[0], move[1]);
}
// insertion sort from big to small (alpha-beta optimization)
reversedInsertionSort(relevantMoves, moveValueTable);
result = Double.NEGATIVE_INFINITY;
// minimax
for (int[] move : relevantMoves) { // y first, x second
node.addMove(move[0], move[1], player);
double score = minimax(node, node.getEnemy(player), depth+1, lowerBound, upperBound);
node.retractMove(move[0], move[1]);
if (score > lowerBound) {
lowerBound = score;
}
if (score > result) result = score;
if (lowerBound > upperBound) {
break;
}
}
return result;
}
}

Here is a very good explanation of how transposition tables work: TT
It can improve your search speed by eliminating transposition from the search tree. Transpositions are positions that can be attained by two or more different sequences of moves. Some games, like chess or checkers have plenty of transpositions, others have very little or even none.
Once you have the transposition table in place, it is easy to add more speed optimizations that rely on it.

Related

Alpha beta pruning not producing good results

---------------
Actual Question
---------------
Ok, the real problem is not with alpha-beta pruning vs minimax algorithms. The problem is that minimax algorithm when in a tree will give only the best solutions whereas alpha-beta will give the correct value, but multiple children have that best value and some of these children shouldn't have this value.
I guess the ultimate question is, what is the most efficient way to get the best (could be multiple in the case of a tie) child of the root node.
The algorithm produces the correct value, but multiple nodes tie with that value, even though some of the moves are obviously wrong.
Example:
TickTackToe
-|-|O
-|X|-
-|X|-
will produce the values as:
(0,1) and (1,0) with value of -0.06 with my heuristic
(0,1) is the correct value as it will block my X's but (0,1) is wrong as then next move i can put an X at (0,1) and win.
When i run the same algorithm without the
if(beta<=alpha)
break;
It only returns the (0,1) with value -0.06
---------------
Originally posted question, now just sugar
---------------
I've spent days trying to figure out why my min max algorithm works, but when i add alpha beta pruning to it, it doesn't work. I understand they should give the same results and I even made a quick test of that.
My question, is why doesn't my implementation produce the same results?
This is a tic tak toe implementation in android. I can beat the algorithm sometimes when
if(beta<=alpha) break;
is not commented out, but when it is commented out it is undefeatable.
private static double minimax(Node<Integer,Integer> parent, int player, final int[][] board, double alpha, double beta, int depth) {
List<Pair<Integer, Integer>> moves = getAvailableMoves(board);
int bs = getBoardScore(board);
if (moves.isEmpty() || Math.abs(bs) == board.length)//leaf node
return bs+(player==X?-1:1)*depth/10.;
double bestVal = player == X ? -Integer.MAX_VALUE : Integer.MAX_VALUE;
for(Pair<Integer, Integer> s : moves){
int[][] b = clone(board);
b[s.getFirst()][s.getSecond()]=player;
Node<Integer, Integer> n = new Node<>(bs,b.hashCode());
parent.getChildren().add(n);
n.setParent(parent);
double score = minimax(n,player==O?X:O,b,alpha,beta, depth+1);
n.getValues().put("score",score);
n.getValues().put("pair",s);
if(player == X) {
bestVal = Math.max(bestVal, score);
alpha = Math.max(alpha,bestVal);
} else {
bestVal = Math.min(bestVal, score);
beta = Math.min(beta,bestVal);
}
/*
If i comment these two lines out it works as expected
if(beta<= alpha)
break;
*/
}
return bestVal;
}
Now this wouldn't be a problem for tick tack toe due to the small search tree, but i then developed it for checkers and noticed the same phenomenon.
private double alphaBeta(BitCheckers checkers, int depth, int absDepth, double alpha, double beta){
if(checkers.movesWithoutAnything >= 40)
return 0;//tie game//needs testing
if(depth == 0 || checkers.getVictoryState() != INVALID)
return checkers.getVictoryState()==INVALID?checkers.getBoardScore()-checkers.getPlayer()*moves/100.:
checkers.getPlayer() == checkers.getVictoryState() ? Double.MAX_VALUE*checkers.getPlayer():
-Double.MAX_VALUE*checkers.getPlayer();
List<Pair<Pair<Integer, Integer>, Pair<Integer, Integer>>> moves;
if(absDepth == maxDepth)
moves = (List<Pair<Pair<Integer, Integer>, Pair<Integer, Integer>>>) node.getValues().get("moves");
else
moves = checkers.getAllPlayerMoves();
if(moves.isEmpty()) //no moves left? then this player loses
return checkers.getPlayer() * -Double.MAX_VALUE;
double v = checkers.getPlayer() == WHITE ? Double.NEGATIVE_INFINITY : Double.POSITIVE_INFINITY;
for(Pair<Pair<Integer, Integer>, Pair<Integer, Integer>> i : moves){
BitCheckers c = checkers.clone();
c.movePiece(i.getFirst().getFirst(),i.getFirst().getSecond(),i.getSecond().getFirst(),i.getSecond().getSecond());
int newDepth = c.getPlayer() == checkers.getPlayer() ? depth : depth - 1;
if(checkers.getPlayer() == WHITE) {
v = Math.max(v, alphaBeta(c, newDepth, absDepth - 1, alpha, beta));
alpha = Math.max(alpha,v);
}else {
v = Math.min(v, alphaBeta(c, newDepth, absDepth - 1, alpha, beta));
beta = Math.min(beta,v);
}
if(absDepth == maxDepth) {
double finalScore = v;
for(Node n : node.getChildren())
if(n.getData().equals(i)){
n.setValue(finalScore);
break;
}
}
/*
If i comment these two lines out it works as expected
if(beta<= alpha)
break;
*/
}
return v;
}
I tested it with pvs and it gives the same results as alpha-beta pruning, ie not nearly as good as just minimax.
public double pvs(BitCheckers checkers, int depth, int absDepth, double alpha, double beta){
if(checkers.movesWithoutAnything >= 40)
return 0;//tie game//needs testing
if(depth == 0 || checkers.getVictoryState() != INVALID)
return checkers.getVictoryState()==INVALID?checkers.getBoardScore()-checkers.getPlayer()*moves/100.:
checkers.getPlayer() == checkers.getVictoryState() ? Double.MAX_VALUE*checkers.getPlayer():
-Double.MAX_VALUE*checkers.getPlayer();
List<Pair<Pair<Integer, Integer>, Pair<Integer, Integer>>> moves;
if(absDepth == maxDepth)
moves = (List<Pair<Pair<Integer, Integer>, Pair<Integer, Integer>>>) node.getValues().get("moves");
else
moves = checkers.getAllPlayerMoves();
if(moves.isEmpty()) //no moves left? then this player loses
return checkers.getPlayer() * -Double.MAX_VALUE;
int j = 0;
double score;
for(Pair<Pair<Integer, Integer>, Pair<Integer, Integer>> i : moves){
BitCheckers c = checkers.clone();
c.movePiece(i.getFirst().getFirst(),i.getFirst().getSecond(),i.getSecond().getFirst(),i.getSecond().getSecond());
int newDepth = c.getPlayer() == checkers.getPlayer() ? depth : depth - 1;
double sign = c.getPlayer() == checkers.getPlayer()? -1 : 1;
if(j++==0)
score = -pvs(c,newDepth,absDepth-1,sign*-beta,sign*-alpha);
else {
score = -pvs(c,newDepth, absDepth-1,sign*-(alpha+1),sign*-alpha);
if(alpha<score || score<beta)
score = -pvs(c,newDepth,absDepth-1,sign*-beta,sign*-score);
}
if(absDepth == maxDepth) {
double finalScore = score;
for(Node n : node.getChildren())
if(n.getData().equals(i)){
n.setValue(finalScore);
break;
}
}
alpha = Math.max(alpha,score);
if(alpha>=beta)
break;
}
return alpha;
}
Checkers without alpha beta pruning is good, but not great. I know with a working version of alpha-beta it could be really great. Please help fix my alpha-beta pruning.
I understand it should give the same result, my question is why is my implementation not giving the same results?
To confirm that it should give the same results, i made a quick test class implementation.
public class MinimaxAlphaBetaTest {
public static void main(String[] args) {
Node<Double,Double> parent = new Node<>(0.,0.);
int depth = 10;
createTree(parent,depth);
Timer t = new Timer().start();
double ab = alphabeta(parent,depth+1,Double.NEGATIVE_INFINITY,Double.POSITIVE_INFINITY,true);
t.stop();
System.out.println("Alpha Beta: "+ab+", time: "+t.getTime());
t = new Timer().start();
double mm = minimax(parent,depth+1,true);
t.stop();
System.out.println("Minimax: "+mm+", time: "+t.getTime());
t = new Timer().start();
double pv = pvs(parent,depth+1,Double.NEGATIVE_INFINITY,Double.POSITIVE_INFINITY,1);
t.stop();
System.out.println("PVS: "+pv+", time: "+t.getTime());
if(ab != mm)
System.out.println(ab+"!="+mm);
}
public static void createTree(Node n, int depth){
if(depth == 0) {
n.getChildren().add(new Node<>(0.,(double) randBetween(1, 100)));
return;
}
for (int i = 0; i < randBetween(2,10); i++) {
Node nn = new Node<>(0.,0.);
n.getChildren().add(nn);
createTree(nn,depth-1);
}
}
public static Random r = new Random();
public static int randBetween(int min, int max){
return r.nextInt(max-min+1)+min;
}
public static double pvs(Node<Double,Double> node, int depth, double alpha, double beta, int color){
if(depth == 0 || node.getChildren().isEmpty())
return color*node.getValue();
int i = 0;
double score;
for(Node<Double,Double> child : node.getChildren()){
if(i++==0)
score = -pvs(child,depth-1,-beta,-alpha,-color);
else {
score = -pvs(child,depth-1,-alpha-1,-alpha,-color);
if(alpha<score || score<beta)
score = -pvs(child,depth-1,-beta,-score,-color);
}
alpha = Math.max(alpha,score);
if(alpha>=beta)
break;
}
return alpha;
}
public static double alphabeta(Node<Double,Double> node, int depth, double alpha, double beta, boolean maximizingPlayer){
if(depth == 0 || node.getChildren().isEmpty())
return node.getValue();
double v = maximizingPlayer ? Double.NEGATIVE_INFINITY : Double.POSITIVE_INFINITY;
for(Node<Double,Double> child : node.getChildren()){
if(maximizingPlayer) {
v = Math.max(v, alphabeta(child, depth - 1, alpha, beta, false));
alpha = Math.max(alpha, v);
}else {
v = Math.min(v,alphabeta(child,depth-1,alpha,beta,true));
beta = Math.min(beta,v);
}
if(beta <= alpha)
break;
}
return v;
}
public static double minimax(Node<Double,Double> node, int depth, boolean maximizingPlayer){
if(depth == 0 || node.getChildren().isEmpty())
return node.getValue();
double v = maximizingPlayer ? Double.NEGATIVE_INFINITY : Double.POSITIVE_INFINITY;
for(Node<Double,Double> child : node.getChildren()){
if(maximizingPlayer)
v = Math.max(v,minimax(child,depth-1,false));
else
v = Math.min(v,minimax(child,depth-1,true));
}
return v;
}
}
This does in fact give what i expected alpha-beta and pvs are about the same speed (pvs is slower because the children are in random order) and produce the same results as minimax. This proves that the algorithms are correct, but for whatever reason, my implementation of them are wrong.
Alpha Beta: 28.0, time: 25.863126 milli seconds
Minimax: 28.0, time: 512.6119160000001 milli seconds
PVS: 28.0, time: 93.357653 milli seconds
Source Code for Checkers implementation
Pseudocode for pvs
Pseudocode for alpha beta i'm following
Full Souce Code for the Tick Tack Toe Implementation
I think you might be misunderstanding AB pruning.
AB pruning should give you the same results as MinMax, it's just a way of not going down certain branches because you know making that move would have been worse than another move that you examined which helps when you have massive trees.
Also, MinMax without using a heuristic and cutting off your search will always be undefeatable because you've computed every possible path to reach every terminating state. So I would've expected that AB pruning and MinMax would both be unbeatable so I think something is wrong with your AB pruning. If your minmax is undefeatable, so should your approach using AB pruning.

Minimum number steps to reach goal in chess - knight traversal with BFS

Code given below works for chess of size less than 13 efficiently, but after that it takes too much time and runs forever.
I want to reduce time to reach till end node.
Also this code finds minimum path from starti,startj to endi,endj where starti and startj takes value from 1 to n-1.
Here is the problem that I am trying to solve:
https://www.hackerrank.com/challenges/knightl-on-chessboard/problem
Program:
import java.util.LinkedList;<br>
import java.util.Scanner;
class Node {
int x,y,dist;
Node(int x, int y, int dist) {
this.x = x;
this.y = y;
this.dist = dist;
}
public String toString() {
return "x: "+ x +" y: "+y +" dist: "+dist;
}
}
class Solution {
public static boolean checkBound(int x, int y, int n) {
if(x >0 && y>0 && x<=n && y<=n)
return true;
return false;
}
public static void printAnswer(int answer[][], int n) {
for(int i=0; i<n-1; i++) {
for(int j=0; j<n-1; j++) {
System.out.print(answer[i][j]+" ");
}
System.out.println();
}
}
public static int findMinimumStep(int n, int[] start, int[] end, int a, int b) {
LinkedList<Node> queue = new LinkedList();
boolean visited[][] = new boolean[n+1][n+1];
queue.add(new Node(start[0],start[1],0));
int x,y;
int[] dx = new int[] {a, -a, a, -a, b, -b, b, -b};
int[] dy = new int[] {b, b, -b, -b, a, a, -a, -a};
while(!queue.isEmpty()) {
Node z = queue.removeFirst();
visited[z.x][z.y] = true;
if(z.x == end[0] && z.y == end[1])
return z.dist;
for(int i=0; i<8; i++)
{
x = z.x + dx[i];
y = z.y + dy[i];
if(checkBound(x,y,n) && !visited[x][y])
queue.add(new Node(x,y,z.dist+1));
}
}
return -1;
}
public static void main(String args[]) {
Scanner scan = new Scanner(System.in);
int n = scan.nextInt();
int start[] = new int[] {1,1};
int goal[] = new int[] {n,n};
int answer[][] = new int[n-1][n-1];
for(int i=1; i<n; i++) {
for(int j=i; j<n; j++) {
int result = findMinimumStep(n, start, goal, i, j);
answer[i-1][j-1] = result;
answer[j-1][i-1] = result;
}
}
printAnswer(answer,n);
}
}
You set visited too late and the same cells are added multiple times to the queue, then you pop them from the queue without checking their visited state that makes things even worse. This leads to the fast growth of the queue.
You need to set visited right after you add the Node to the queue:
if(checkBound(x,y,n) && !visited[x][y]) {
queue.add(new Node(x,y,z.dist+1));
visited[x][y] = true;
}
Even if you optimize your code, you will not reduce the complexity of the algorithm.
I think you need to think about how to reduce the search space. Or search it in a clever order.
I would go for a A*-search
The most effective solution in your problem is Dijkstra's algorithm. Treat the squares as nodes and draw edges towards the other squares/nodes that the knight can visit. Then run the algorithm for this graph. It performs in logarithmic time so it scales pretty good for big problems.
A* search suggest by MrSmith, is a heuristic so I would not suggest it for this kind of problem.
Dijkstra is an important algorithm and implementing it will help you solve many similar problems in the future, for example you can also solve this problem problem with the same logic.

Finding biggest area of adjacent numbers in a matrix using DFS algorithm

I am learning programming on my own with a book for beginners. My last task after chapter Arrays is to :
// Find the biggest area of adjacent numbers in this matrix:
int[][] matrix = {
{1,3,2,2,2,4},
{3,3,3,2,4,4},
{4,3,1,2,3,3}, // --->13 times '3';
{4,3,1,3,3,1},
{4,3,3,3,1,1}
As a hint I have - use DFS or BFS algorithm. After I read about them and saw many their implementations I got the idea but it was just too overwhelming for a beginner. I found the solution for my task and after I runned the program many times I understood how it works and now I can solve the problem on my own. Although, I am happy that this solution helped me to learn about recursion, I am wondering can the following code be modified in iterative way and if so can you give me hints how to do it? Thank you in advance.
public class Practice {
private static boolean[][] visited = new boolean[6][6];
private static int[] dx = {-1,1,0,0};
private static int[] dy = {0,0,-1,1};
private static int newX;
private static int newY;
public static void main(String[] args){
// Find the biggest area of adjacent numbers in this matrix:
int[][] matrix = {
{1,3,2,2,2,4},
{3,3,3,2,4,4},
{4,3,1,2,3,3}, // --->13 times '3';
{4,3,1,3,3,1},
{4,3,3,3,1,1}
};
int current = 0;
int max = 0;
for (int rows = 0; rows < matrix.length;rows++){
for(int cols = 0; cols < matrix[rows].length;cols++){
if (visited[rows][cols] == false){
System.out.printf("Visited[%b] [%d] [%d] %n", visited[rows]
[cols],rows,cols);
current = dfs(matrix,rows,cols,matrix[rows][cols]);
System.out.printf("Current is [%d] %n", current);
if(current > max){
System.out.printf("Max is : %d %n ", current);
max = current;
}
}
}
}
System.out.println(max);
}
static int dfs(int[][] matrix,int x, int y, int value){
if(visited[x][y]){
System.out.printf("Visited[%d][%d] [%b] %n",x,y,visited[x][y]);
return 0;
} else {
visited[x][y] = true;
int best = 0;
int bestX = x;
int bestY = y;
for(int i = 0; i < 4;i++){
//dx = {-1,1,0,0};
//dy = {0,0,-1,1};
int modx = dx[i] + x;
System.out.printf(" modx is : %d %n", modx);
int mody = dy[i] + y;
System.out.printf(" mody is : %d %n", mody);
if( modx == -1 || modx >= matrix.length || mody == -1 || mody >=
matrix[0].length){
continue;
}
if(matrix[modx][mody] == value){
System.out.printf("Value is : %d %n",value);
int v = dfs(matrix,modx,mody,value);
System.out.printf(" v is : %d %n",v);
best += v;
System.out.printf("best is %d %n",best);
}
newX = bestX;
System.out.printf("newX is : %d %n",newX);
newY = bestY;
System.out.printf("newY is : %d %n",newY);
}
System.out.printf("Best + 1 is : %d %n ",best + 1);
return best + 1;
}
}
}
If you look on the Wikipedia page for Depth-first search under the pseudocode section, they have an example of a iterative verision of the DFS algorithm. Should be able to figure out a solution from there.
*Edit
To make it iterative, you can do the following:
procedure DFS-iterative(matrix, x, y):
let S be a stack
let value = 0
if !visited[v.x, v.y]
S.push(position(x,y))
while S is not empty
Position v = S.pop()
value += 1
for all valid positions newPosition around v
S.push(newPosition)
return value
Everytime you would call the dfs() method in the recursive method, you should be calling S.push(). You can create class Position as follows
class Position{
int x;
int y;
public Position(int x, int y){
this.x = x;
this.y = y;
}
//getters and setters omitted for brevity
}
and use the built in java class java.util.Stack to make it easy.
Stack<Position> s = new Stack<Position>();
If you want to use BFS instead of DFS, you can simple change the Stack to a Queue and you will get the desired result. This link has a very nice explanation of stacks and queues and may prove useful as you learn about the topic.
I assume you are looking for a BFS solution, since you already have a working DFS, and BFS is iterative while DFS is recursive (or at least, is easier to implement recursively).
The (untested) BFS code to measure a region's size could be:
public static int regionSize(int[][] matrix,
int row, int col, HashSet<Point> visited) {
ArrayDeque<Point> toVisit = new ArrayDeque<>();
toVisit.add(new Point(col, row));
int regionColor = matrix[col][row];
int regionSize = 0;
while ( ! toVisit.isEmpty()) {
Point p = toVisit.removeFirst(); // use removeLast() to emulate DFS
if ( ! visited.contains(p)) {
regionSize ++;
visited.add(p);
// now, add its neighbors
for (int[] d : new int[][] {{1, 0}, {0, 1}, {-1, 0}, {0, -1}}) {
int nx = p.x + d[0];
int ny = p.y + d[1];
if (nx >= 0 && nx < matrix[0].length
&& ny >= 0 && ny < matrix.length
&& matrix[ny][nx] == regionColor) {
toVisit.addLast(new Point(nx, ny)); // add neighbor
}
}
}
}
return regionSize;
}
Note that you can change a (queue-based) BFS into an iterative DFS by changing a single line. In a recursive DFS, you would be using the program stack to keep track of toVisit intead of an explicit stack/deque. You can test this by adding a System.out.println to track the algorithm's progress.
Above, I use a HashSet of Point instead of a boolean[][] array, but feel free to use whichever is easiest for you.

Approximating square roots in java using the squeeze theorem

Hey guys this is for a homework assignment that is way over my head. My teacher is moving the class very quickly. This is my fourth program I have ever written in java, and I am looking for some advise. I need to find the approximate sqrt of a number to an error of EPSILON defined in my program. However, this needs to be accomplished using the squeeze theorem, and constantly updating my bounds. In java how does one update the values of variables fluidly when they are used throughout? Keep in mind my professor has not gotten to return values yet, so I do not think he intends for us to use them. Keep in mind I am quite the novice, but I have an open mind.
public static void main(String[] args) {
Scanner sc = new Scanner(System.in);
final double EPSILON = .0000000001;
System.out.print("Enter a number to find its square root -> ");
double number = sc.nextDouble();
double low = 0;
double high = 0;
double midPoint = (low+high)/2;
high = number;
double midPointSqr = midPoint*midPoint;
if (number < 0) {
System.out.println("NaN");
} else {
while ((Math.abs(midPointSqr - number)) > EPSILON) {
{
if (number <= 1) {
low = 0;
if (midPointSqr > number)
{
midPoint = (high+low)/2;
high = high/2;
System.out.printf("%.6f, %.6f\n", low, high);
}
else
{
midPoint = (high+low)/2;
low = high/2;
System.out.printf("%.6f, %.6f\n", low, high);
}
} else {
low = 1;
if (midPointSqr > number)
{
midPoint = (high+low)/2;
high = high/2;
System.out.printf("%.6f, %.6f\n", low, high);
}
else
{
midPoint = (high+low)/2;
low = high/2;
System.out.printf("%.6f, %.6f\n", low, high);
}
}
}
}
}
}
}
you need to update midPointSqr variable whenever you update midpoint variable. So wherever you have assignment statements like
midPoint = <something>;
you should recopute your square right after that like this:
midPointSqr = midPoint * midPoint
Another alternarive is to use a function everywhere instead of using midPointSqr variable.
double getSqr(double value){
return value* value;
}
so whereever you are using midPointSqr variable replace it with following method code:
getSqr(midPoint)

AI How to model genetic programming for Battleships

I have a question regarding Genetic Programming. I am going to work on a genetic algorithm for a game called Battleships.
My question is: How would I decide upon a "decision" model for the AI to evolve? And how does that work?
I have read multiple papers and multiple answers that just speak about using different models, but could not find something specific, which, unfortunately, I apparently need to wrap my head around the problem.
I want it to evolve over multiple iterations and "learn" what works best, but not sure how to save these "decisions" (I know to a file, but "encoded" how?)
in a good way, so it will learn to take a stance to previous actions and base off info from the current board state.
I have been contemplating a "Tree structure" for the AI to base decisions on, but I don't actually know how to get started.
If someone could either point me in the right direction (a link? Some pseudo-code? Something like that), that'd be very much appreciated, I tried to google as much as possible, watch multiple youtube videos about the subject, but I think I just need that little nudge in the right direction.
I may also just not know what exactly to search for, and this is why I come up blank with results on what and how I implement this.
ANSWER PART I: The basis for a genetic algorithm is a having a group of actors, some of which reproduce. The fittest are chosen for reproduction and the offspring are copies of the parents that are slightly mutated. It's a pretty simple concept, but to program it you have to have actions that can be randomly chosen and dynamically modified. For the battleship simulation I created a class called a Shooter because it 'shoots' at a position. The assumption here is that the first position has been hit, and the shooter is now trying to sink the battleship.
public class Shooter implements Comparable<Shooter> {
private static final int NUM_SHOTS = 100;
private List<Position> shots;
private int score;
// Make a new set of random shots.
public Shooter newShots() {
shots = new ArrayList<Position>(NUM_SHOTS);
for (int i = 0; i < NUM_SHOTS; ++i) {
shots.add(newShot());
}
return this;
}
// Test this shooter against a ship
public void testShooter(Ship ship) {
score = shots.size();
int hits = 0;
for (Position shot : shots) {
if (ship.madeHit(shot)) {
if (++hits >= ship.getSize())
return;
} else {
score = score - 1;
}
}
}
// get the score of the testShotr operation
public int getScore() {
return score;
}
// compare this shooter to other shooters.
#Override
public int compareTo(Shooter o) {
return score - o.score;
}
// getter
public List<Position> getShots() {
return shots;
}
// reproduce this shooter
public Shooter reproduce() {
Shooter offspring = new Shooter();
offspring.mutate(shots);
return offspring;
}
// mutate this shooter's offspring
private void mutate(List<Position> pShots) {
// copy parent's shots (okay for shallow)
shots = new ArrayList<Position>(pShots);
// 10% new mutations, in random locations
for (int i = 0; i < NUM_SHOTS / 10; i++) {
int loc = (int) (Math.random() * 100);
shots.set(loc, newShot());
}
}
// make a new random move
private Position newShot() {
return new Position(((int) (Math.random() * 6)) - 3, ((int) (Math.random() * 6)) - 3);
}
}
The idea here is that a Shooter has up to 100 shots, randomly chosen between +-3 in the X and +- 3 in the Y. Yea, 100 shots is overkill, but hey, whatever. Pass a Ship to this Shooter.testShooter and it will score itself, 100 being the best score, 0 being the worst.
This Shooter actor has reproduce and mutate methods that will return an offspring that has 10% of its shots randomly mutated. The general idea is that the best Shooters have 'learned' to shoot their shots in a cross pattern ('+') as quickly as possible, since a ship is oriented in one of four ways (North, South, East, West).
The program that runs the simulation, ShooterSimulation, is pretty simple:
public class ShooterSimulation {
private int NUM_GENERATIONS = 1000;
private int NUM_SHOOTERS = 20;
private int NUM_SHOOTERS_NEXT_GENERATION = NUM_SHOOTERS / 10;
List<Shooter> shooters = new ArrayList<Shooter>(NUM_SHOOTERS);
Ship ship;
public static void main(String... args) {
new ShooterSimulation().run();
}
// do the work
private void run() {
firstGeneration();
ship = new Ship();
for (int gen = 0; gen < NUM_GENERATIONS; ++gen) {
ship.newOrientation();
testShooters();
Collections.sort(shooters);
printAverageScore(gen, shooters);
nextGeneration();
}
}
// make the first generation
private void firstGeneration() {
for (int i = 0; i < NUM_SHOOTERS; ++i) {
shooters.add(new Shooter().newShots());
}
}
// test all the shooters
private void testShooters() {
for (int mIdx = 0; mIdx < NUM_SHOOTERS; ++mIdx) {
shooters.get(mIdx).testShooter(ship);
}
}
// print the average score of all the shooters
private void printAverageScore(int gen, List<Shooter> shooters) {
int total = 0;
for (int i = 0, j = shooters.size(); i < j; ++i) {
total = total + shooters.get(i).getScore();
}
System.out.println(gen + " " + total / shooters.size());
}
// throw away the a tenth of old generation
// replace with offspring of the best fit
private void nextGeneration() {
for (int l = 0; l < NUM_SHOOTERS_NEXT_GENERATION; ++l) {
shooters.set(l, shooters.get(NUM_SHOOTERS - l - 1).reproduce());
}
}
}
The code reads as pseudo-code from the run method: make a firstGeneration then iterate for a number of generations. For each generation, set a newOrientation for the ship, then do testShooters, and sort the results of the test with Collections.sort. printAverageScore of the test, then build the nextGeneration. With the list of average scores you can, cough cough, do an 'analysis'.
A graph of the results looks like this:
As you can see it starts out with pretty low average scores, but learns pretty quickly. However, the orientation of the ship keeps changing, causing some noise in addition to the random component. Every now and again a mutation messes up the group a bit, but less and less as the group improves overall.
Challenges, and the reason for many papers to be sure, is to make more things mutable, especially in a constructive way. For example, the number of shots could be mutable. Or, replacing the list of shots with a tree that branches depending on whether the last shot was a hit or miss might improve things, but it's difficult to say. That's where the 'decision' logic considerations come in. Is it better to have a list of random shots or a tree that decides which branch to take depending on the prior shot? Higher level challenges include predicting what changes will make the group learn faster and be less susceptible to bad mutations.
Finally, consider that there could be multiple groups, one group a battleship hunter and one group a submarine hunter for example. Each group, though made of the same code, could 'evolve' different internal 'genetics' that allow them to specialize for their task.
Anyway, as always, start somewhere simple and learn as you go until you get good enough to go back to reading the papers.
PS> Need this too:
public class Position {
int x;
int y;
Position(int x, int y ) {this.x=x; this.y=y;}
#Override
public boolean equals(Object m) {
return (((Position)m).x==x && ((Position)m).y==y);
}
}
UDATE: Added Ship class, fixed a few bugs:
public class Ship {
List<Position> positions;
// test if a hit was made
public boolean madeHit(Position shot) {
for (Position p: positions) {
if ( p.equals(shot)) return true;
}
return false;
}
// make a new orientation
public int newOrientation() {
positions = new ArrayList<Position>(3);
// make a random ship direction.
int shipInX=0, oShipInX=0 , shipInY=0, oShipInY=0;
int orient = (int) (Math.random() * 4);
if( orient == 0 ) {
oShipInX = 1;
shipInX = (int)(Math.random()*3)-3;
}
else if ( orient == 1 ) {
oShipInX = -1;
shipInX = (int)(Math.random()*3);
}
else if ( orient == 2 ) {
oShipInY = 1;
shipInY = (int)(Math.random()*3)-3;
}
else if ( orient == 3 ) {
oShipInY = -1;
shipInY = (int)(Math.random()*3);
}
// make the positions of the ship
for (int i = 0; i < 3; ++i) {
positions.add(new Position(shipInX, shipInY));
if (orient == 2 || orient == 3)
shipInY = shipInY + oShipInY;
else
shipInX = shipInX + oShipInX;
}
return orient;
}
public int getSize() {
return positions.size();
}
}
I would suggest you another approach. This approach is based on the likelihood where a ship can be. I will show you an example on a smaller version of the game (the same idea is for all other versions). In my example it is 3x3 area and has only one 1x2 ship.
Now you take an empty area, and put the ship in all possible positions (storing the number of times the part of the ship was in the element of the matrix). If you will do this for a ship 1x2, you will get the following
1 2 1
1 2 1
1 2 1
Ship can be in another direction 2x1 which will give you the following matrix:
1 1 1
2 2 2
1 1 1
Summing up you will get the matrix of probabilities:
2 3 2
3 4 3
2 3 2
This means that the most probable location is the middle one (where we have 4). Here is where you should shoot.
Now lets assume you hit the part of the ship. If you will recalculate the likelihood matrix, you will get:
0 1 0
1 W 1
0 1 0
which tells you 4 different possible positions for a next shoot.
If for example you would miss on the previous step, you will get the following matrix:
2 2 2
2 M 2
2 2 2
This is the basic idea. The way how you try to reposition the ships is based on the rules how the ships can be located and also what information you got after each move. It can be missed/got or missed/wounded/killed.
ANSWER PART III: As you can see, the Genetic Algorithm is generally not the hard part. Again, it's a simple piece of code that is really meant to exercise another piece of code, the actor. Here, the actor is implemented in a Shooter class. These actor's are often modelled in the fashion of Turning Machines, in the sense that the actor has a defined set of outputs for a set of inputs. The GA helps you to determine the optimal configuration of the state table. In the prior answers to this question, the Shooter implemented a probability matrix like what was described by #SalvadorDali in his answer.
Testing the prior Shooter thoroughly, we find that the best it can do is something like:
BEST Ave=5, Min=3, Max=9
Best=Shooter:5:[(1,0), (0,0), (2,0), (-1,0), (-2,0), (0,2), (0,1), (0,-1), (0,-2), (0,1)]
This shows it takes 5 shots average, 3 at a minimum, and 9 at a maximum to sink a 3X3 battleship. The locations of the 9 shots are shown a X/Y coordinate pairs. The question "Can this be done better?" depends on human ingenuity. A Genetic Algorithm can't write new actors for us. I wondered if a decision tree could do better than a probability matrix, so I implemented one to try it out:
public class Branch {
private static final int MAX_DEPTH = 10;
private static final int MUTATE_PERCENT = 20;
private Branch hit;
private Branch miss;
private Position shot;
public Branch() {
shot = new Position(
(int)((Math.random()*6.0)-3),
(int)((Math.random()*6.0)-3)
);
}
public Branch(Position shot, Branch hit, Branch miss) {
this.shot = new Position(shot.x, shot.y);
this.hit = null; this.miss = null;
if ( hit != null ) this.hit = hit.clone();
if ( miss != null ) this.miss = miss.clone();
}
public Branch clone() {
return new Branch(shot, hit, miss);
}
public void buildTree(Counter c) {
if ( c.incI1() > MAX_DEPTH ) {
hit = null;
miss = null;
c.decI1();
return;
} else {
hit = new Branch();
hit.buildTree(c);
miss = new Branch();
miss.buildTree(c);
}
c.decI1();
}
public void shoot(Ship ship, Counter c) {
c.incI1();
if ( ship.madeHit(shot)) {
if ( c.incI2() == ship.getSize() ) return;
if ( hit != null ) hit.shoot(ship, c);
}
else {
if ( miss != null ) miss.shoot(ship, c);
}
}
public void mutate() {
if ( (int)(Math.random() * 100.0) < MUTATE_PERCENT) {
shot.x = (int)((Math.random()*6.0)-3);
shot.y = (int)((Math.random()*6.0)-3);
}
if ( hit != null ) hit.mutate();
if ( miss != null ) miss.mutate();
}
#Override
public String toString() {
StringBuilder sb = new StringBuilder();
sb.append(shot.toString());
if ( hit != null ) sb.append("h:"+hit.toString());
if ( miss != null ) sb.append("m:"+miss.toString());
return sb.toString();
}
}
The Branch class is a node in a decision tree (ok, maybe poorly named). At every shot, the next branch chosen depends on whether the shot was awarded a hit or not.
The shooter is modified somewhat to use the new decisionTree.
public class Shooter implements Comparable<Shooter> {
private Branch decisionTree;
private int aveScore;
// Make a new random decision tree.
public Shooter newShots() {
decisionTree = new Branch();
Counter c = new Counter();
decisionTree.buildTree(c);
return this;
}
// Test this shooter against a ship
public int testShooter(Ship ship) {
Counter c = new Counter();
decisionTree.shoot(ship, c);
return c.i1;
}
// compare this shooter to other shooters, reverse order
#Override
public int compareTo(Shooter o) {
return o.aveScore - aveScore;
}
// mutate this shooter's offspring
public void mutate(Branch pDecisionTree) {
decisionTree = pDecisionTree.clone();
decisionTree.mutate();
}
// min, max, setters, getters
public int getAveScore() {
return aveScore;
}
public void setAveScore(int aveScore) {
this.aveScore = aveScore;
}
public Branch getDecisionTree() {
return decisionTree;
}
#Override
public String toString() {
StringBuilder ret = new StringBuilder("Shooter:"+aveScore+": [");
ret.append(decisionTree.toString());
return ret.append(']').toString();
}
}
The attentive reader will notice that while the methods themselves have changed, which methods a Shooter needs to implement is not different from the prior Shooters. This means the main GA simulation has not changed except for one line related to mutations, and that probably could be worked on:
Shooter child = shooters.get(l);
child.mutate( shooters.get(NUM_SHOOTERS - l - 1).getDecisionTree());
A graph of a typical simulation run now looks like this:
As you can see, the final best average score evolved using a Decision Tree is one shot less than the best average score evolved for a Probability Matrix. Also notice that this group of Shooters has taken around 800 generations to train to their optimum, about twice as long than the simpler probability matrix Shooters. The best decision tree Shooter gives this result:
BEST Ave=4, Min=3, Max=6
Best=Shooter:4: [(0,-1)h:(0,1)h:(0,0) ... ]
Here, not only does the average take one shot less, but the maximum number of shots is 1/3 lower than a probability matrix Shooter.
At this point it takes some really smart guys to determine whether this actor has achieved the theoretical optimum for the problem domain, i.e., is this the best you can do trying to sink a 3X3 ship? Consider that the answer to that question would become more complex in the real battleship game, which has several different size ships. How would you build an actor that incorporates the knowledge of which of the boats have already been sunk into actions that are randomly chosen and dynamically modified? Here is where understanding Turing Machines, also known as CPUs, becomes important.
PS> You will need this class also:
public class Counter {
int i1;
int i2;
public Counter() {i1=0;i2=0;}
public int incI1() { return ++i1; }
public int incI2() { return ++i2; }
public int decI1() { return --i1; }
public int decI2() { return --i2; }
}
ANSWER PART II: A Genetic Algorithm is not a end unto itself, it is a means to accomplish an end. In the case of this example of battleship, the end is to make the best Shooter. I added the a line to the prior version of the program to output the best shooter's shot pattern, and noticed something wrong:
Best shooter = Shooter:100:[(0,0), (0,0), (0,0), (0,-1), (0,-3), (0,-3), (0,-3), (0,0), (-2,-1) ...]
The first three shots in this pattern are at coordinates (0,0), which in this application are guaranteed hits, even though they are hitting the same spot. Hitting the same spot more than once is against the rules in battleship, so this "best" shooter is the best because it has learned to cheat!
So, clearly the program needs to be improved. To do that, I changed the Ship class to return false if a position has already been hit.
public class Ship {
// private class to keep track of hits
private class Hit extends Position {
boolean hit = false;
Hit(int x, int y) {super(x, y);}
}
List<Hit> positions;
// need to reset the hits for each shooter test.
public void resetHits() {
for (Hit p: positions) {
p.hit = false;
}
}
// test if a hit was made, false if shot in spot already hit
public boolean madeHit(Position shot) {
for (Hit p: positions) {
if ( p.equals(shot)) {
if ( p.hit == false) {
p.hit = true;
return true;
}
return false;
}
}
return false;
}
// make a new orientation
public int newOrientation() {
positions = new ArrayList<Hit>(3);
int shipInX=0, oShipInX=0 , shipInY=0, oShipInY=0;
// make a random ship orientation.
int orient = (int) (Math.random() * 4.0);
if( orient == 0 ) {
oShipInX = 1;
shipInX = 0-(int)(Math.random()*3.0);
}
else if ( orient == 1 ) {
oShipInX = -1;
shipInX = (int)(Math.random()*3.0);
}
else if ( orient == 2 ) {
oShipInY = 1;
shipInY = 0-(int)(Math.random()*3.0);
}
else if ( orient == 3 ) {
oShipInY = -1;
shipInY = (int)(Math.random()*3.0);
}
// make the positions of the ship
for (int i = 0; i < 3; ++i) {
positions.add(new Hit(shipInX, shipInY));
if (orient == 2 || orient == 3)
shipInY = shipInY + oShipInY;
else
shipInX = shipInX + oShipInX;
}
return orient;
}
public int getSize() {
return positions.size();
}
}
After I did this, my shooters stopped "cheating", but that got me to thinking about the scoring in general. What the prior version of the application was doing was scoring based on how many shots missed, and hence a shooter could get a perfect score if none of the shots missed. However, that is unrealistic, what I really want is shooters that shoot the least shots. I changed the shooter to keep track of the average of shots taken:
public class Shooter implements Comparable<Shooter> {
private static final int NUM_SHOTS = 40;
private List<Position> shots;
private int aveScore;
// Make a new set of random shots.
public Shooter newShots() {
shots = new ArrayList<Position>(NUM_SHOTS);
for (int i = 0; i < NUM_SHOTS; ++i) {
shots.add(newShot());
}
return this;
}
// Test this shooter against a ship
public int testShooter(Ship ship) {
int score = 1;
int hits = 0;
for (Position shot : shots) {
if (ship.madeHit(shot)) {
if (++hits >= ship.getSize())
return score;
}
score++;
}
return score-1;
}
// compare this shooter to other shooters, reverse order
#Override
public int compareTo(Shooter o) {
return o.aveScore - aveScore;
}
... the rest is the same, or getters and setters.
}
I also realized that I had to test each shooter more than once in order to be able to get an average number of shots fired against battleships. For that, I subjected each shooter individually to a test multiple times.
// test all the shooters
private void testShooters() {
for (int i = 0, j = shooters.size(); i<j; ++i) {
Shooter current = shooters.get(i);
int totalScores = 0;
for (int play=0; play<NUM_PLAYS; ++play) {
ship.newOrientation();
ship.resetHits();
totalScores = totalScores + current.testShooter(ship);
}
current.setAveScore(totalScores/NUM_PLAYS);
}
}
Now, when I run the simulation, I get the average of the averages an output. The graph generally looks something like this:
Again, the shooters learn pretty quickly, but it takes a while for random changes to bring the averages down. Now my best Shooter makes a little more sense:
Best=Shooter:6:[(1,0), (0,0), (0,-1), (2,0), (-2,0), (0,1), (-1,0), (0,-2), ...
So, a Genetic Algorithm is helping me to set the configuration of my Shooter, but as another answer here pointed out, good results can be achieved just by thinking about it. Consider that if I have a neural network with 10 possible settings with 100 possible values in each setting, that's 10^100 possible settings and the theory for how those settings should be set may a little more difficult than battleship shooter theory. In this case, a Genetic Algorithm can help determine optimal settings and test current theory.

Categories

Resources