I am trying to write a small AI algorithm in Java implementing the miniMax algorithm.
The game upon which this is based is a two-player game where both players make one move per turn, and each board position resulting in each player having a score. The "quality" of a position for player X is evaluated by subtracting the opponent's score from player X's score for that position. Each move is represented by an integer (i.e. Move one is made by inputting 1, move two by inputting 2 etc)
I understand that miniMax should be implemented using recursion. At the moment I have:
An evaluate() method, which takes as parameters an object representing the board state (Ie "BoardState" object and a boolean called "max" (the signature would be evaluate(BoardState myBoard, boolean max)).
max is true when it is player X's turn. Given a board position, it will evaluate all possible moves and return that which is most beneficial for player X. If it is the opponent's turn, max will be false and the method will return the move which is LEAST beneficial for player X (ie: most beneficial for player y)
However, I am having difficulties writing the actual miniMax method. My general structure would be something like:
public int miniMax(GameState myGameState, int depth)
Whereby I submit the initial gameState and the "depth" I want it to look into.
I would then be having something like:
int finalMove = 0;
while(currentDepth < depth) {
GameState tmp = myGameState.copy();
int finalMove = evaluate(tmp, true or false);
return finalMove;
Would this sound like a plausible implementation? Any suggestions? :)
that wont work.
details :
it will cause infinite loop. currentdepth never gets incremented
your definition of evaluation seems to be different than the majority. Normally evaluation function will return the predicted value of the game state. Isnt your definition of evaluate function is just the same as what the minimax function do ?
is miniMax and MiniMax different? because if you meant recursion then you need to pass depth-1 when calling the next miniMax
the idea of minimax is depth first search. and only evaluate leaf nodes(nodes with maximum depth or nodes that is a win or tie) and pick one that is max if the current player is the maximizing one and pick min if the current player is the minimizing one.
this is how i implemented it :
function miniMax(node, depth)
if(depth == 0) then --leaf node
local ret = evaluate(node.state)
return ret
else -- winning node
local winner = whoWin(node.state)
if(winner == 1) then -- P1
return math.huge
elseif(winner == -1) then -- P2
return math.huge*-1
local num_of_moves = getNumberOfMoves(node.state)
local v_t = nil
local best_move_index = nil
if(getTurn(node.state) == 1) then -- maximizing player
local v = -math.huge
for i=0, num_of_moves-1 do
local child_node = simulate(node.state, i)) -- simulate move number i
v_t = miniMax(child_node, depth-1)
if(v_t > v) then
v = v_t
best_move_index = i
if(best_move_index == nil) then best_move_index = random(0, num_of_moves-1) end
return v, best_move_index
else -- minimizing player
local v = math.huge
for i=0, num_of_moves-1 do
local child_node = simulate(node.state, i)
v_t = miniMax(child_node, depth-1)
if(v_t < v) then
v = v_t
best_move_index = i
if(best_move_index == nil) then best_move_index = random(0, num_of_moves-1) end
return v, best_move_index
return v, best_move_index means returning two values of v and best_move_index(above code is in lua and lua can return multiple values)
evaluate function returns the same score for both players(ie game state A in point of view P1 is scored 23, and in point of view P2 is also scored 23)
this algo will only work if the two player run alternately(no player can run two moves consecutively), you can trick this restriction by giving the opponent one move, that is move PASS(skip his/her turn) if the other player need to move twice.
this minimax can be further optimized(sorted from the easiest one) :
alpha-beta pruning
iterative deepening
move ordering
I made an implementation of minimax in lua. I hope it helps give you an idea of how to tackle the algorithm form a Java perspective, the code should be quite similar mind you. It is designed for a game of tic-tac-toe.
--caller is the player who is using the minimax function
--initial state is the board from which the player must make a move
local function minimax(caller,inital_state)
local bestState = {}; --we use this to store the game state the the player will create
--this recurse function is really the 'minimax' algorithim
local function recurse(state,depth)
--childPlayer is the person who will have their turn in the current state's children
local ChildPlayer = getTurn(state)
--parentPlayer is the person who is looking at their children
local ParentPlayer = getPreviousTurn(state)
--this represents the worst case scenario for a player
local alpha = - (ChildPlayer == caller and 1 or -1 );
--we check for terminal conditions (leaf nodes) and return the appropriate objective value
if win(state) then
--return +,- inf depending on who called the 'minimax'
return ParentPlayer == caller and 1 or -1;
elseif tie(state) then
--if it's a tie then the value is 0 (neither win or loss)
return 0;
--this will return a list of child states FROM the current state
children = getChildrenStates(state,ChildPlayer)
--enumerate over each child
for _,child in ipairs(children) do
--find out the child's objective value
beta = recurse(child,depth+1);
if ChildPlayer == caller then
--We want to maximize
if beta >= alpha then
alpha = beta
--if the depth is 0 then we can add the child state as the bestState (this will because the caller of minimax will always want to choose the GREATEST value on the root node)
if depth == 0 then
bestState = child;
--we want to MINIMIZE
elseif beta <= alpha then
alpha = beta;
--return a non-terminal nodes value (propagates values up the tree)
return alpha;
--start the 'minimax' function by calling recurse on the initial state
--return the best move
return bestState;
My chess algorithm is based on negamax. The relevant part is:
private double deepEvaluateBoard(Board board, int currentDepth, double alpha, double beta, Move initialMove) {
if (board.isCheckmate() || board.isDraw() || currentDepth <= 0) {
this.moveHistorys.put(initialMove, board.getMoveHistory()); // this is not working
return evaluateBoard(board); // evaluateBoard evaluates from the perspective of color whose turn it is.
} else {
double totalPositionValue = -1e40;
List<Move> allPossibleMoves = board.getAllPossibleMoves();
for (Move move : allPossibleMoves) {
totalPositionValue = max(-deepEvaluateBoard(board, currentDepth - 1, -beta, -alpha, initialMove), value);
alpha = max(alpha, totalPositionValue);
if (alpha >= beta) {
return totalPositionValue;
It would greatly help debugging if I was be able to access the move sequence that the negamax algorithm bases its evaluation on (where on the decision tree the evaluated value is found).
Currently I am trying to save the move history of the board into a hashmap that is a field of the enclosing class. However, it is not working for some reason, as the produced move sequences are not optimal.
Since developing an intuition for negamax is not very easy, I have ended up on banging my head against the wall on this one for quite some time now. I would much appreciate if someone could point me in the right direction!
I have a problem requiring me to implement an algorithm finding path from a character to another character with obstacles along the way.
I know there are a lot of advanced pathfinding algorithms(A*, BFS, DFS,dijkstra....). However, I am struggling to implement all these concepts in my code after plenty of research and attempts, and also I don't think I am required to implement all these advanced algorithms.
"Shortest" path is not the requirement, and all I need is a path that can lead my character to another character by avoiding moving onto obstacles.
Can anyone give me an idea (maybe some algorithms better than backtracking)or useful website (similar examples) for this problem?
Any help would be much appreciated
I can recommend you the A* algortihm.
Its easy to implements the algorithm. For my A* i used the wikipedia code and the geekforgeek code.
I'll post my code aswell its in C# but very similiar to java:
public List<ANote> findPath(ANote start, ANote end)
if (start == null || end == null || start == end || !start.walkable || !end.walkable)
return null;
List<ANote> openSet = new List<ANote>();
List<ANote> closedSet = new List<ANote>();
start.parent = null;
start.h = getDistance(start, end);
while (openSet.Any())
openSet = openSet.OrderBy(o => o.f).ToList();
ANote current = openSet[0];
if (current == end)
foreach (ANote neighbor in current.adjacted)
if (closedSet.Contains(neighbor))
double _gScore = current.g + 1; // For me every distance was 1
if (openSet.Contains(neighbor) && _gScore >= neighbor.g)
if (!openSet.Contains(neighbor))
neighbor.parent = current;
neighbor.g = _gScore;
neighbor.h = getDistance(neighbor, end);
return reconstructPath(start, end);
private List<ANote> reconstructPath(ANote start, ANote end)
List<ANote> backNotes = new List<ANote>();
ANote current = end;
while (current.parent != null)
current = current.parent;
return backNotes;
public class ANote
public ANote parent { get; set; }
public double f { get { return g + h; } }
public double g { get; set; }
public double h { get; set; }
public int x { get; set; }
public int y { get; set; }
public bool walkable { get; set; }
public List<ANote> adjacted { get; set; } = new List<ANote>();
public ANote(int x, int y)
this.x = x;
this.y = y;
walkable = true;
Important for this code is that you must define the adjacted nodes and which are walkable and what are not before you search.
I hope my code can help you to implement A* in your code.
Since I have no idea how your "Grid" is described, and what "Cell" is, I assume the grid is rectangular, the cells only map the objects on it, not the empty spaces.
I suggest you make an char[][] array = new char[rows][columns]; (or char) initialize it with some value and iterate over Cells, fill the 2D array in some meaningful manner, the example is G for goal, # for obstacle, etc. Then you start a DFS from Goal to look for Start.
You need to store the correct path somewhere, so you need a ArrayList list; variable too. Since it's a list, you can add items to it with list.add(item);, which is handy.
Non-optimal path: DFS
DFS is a really basic recursive algorithm, in your case it will go like this:
bool DFS(list, array, row, column):
if(array[row][column] == '#' or
array[row][column] == 'v') return False; # Obstacle or visited, ignore it
if( ... == 'S') return True; # It's the Start, path found
array[row][column] = 'v'; # mark as visited, nothing interesting.
# If you want the shortest path, you're supposed to put here the distance from goal,
#that you would pass on and increment as additional argument of DFS
#Check for edge cases, or initialize array to be bigger and place obstacles on edge#
if( DFS(list, array, row-1, column) ){ # If DFS fount a path to Start from cell above
list.add("Move Up"); # Then you add a direction to go below to the list
return True; # And then tell the previous DFS, that the path was found
# And then you add the checks for the other directions in a similar manner
return False; # You didn't find anything anywhere
That is not the code, but it should be enough for you to do your assignement from there.
Chances are it'll find a path like this:
But in grids with a lot of obstacles or only one correct path it'll make more reasonable paths. Also you can improve it by selecting the order you try directions in so it always tries to go towards Goal first, but that's a pain.
Optimal path: augmented DFS
To find the shortest path usually people refer to A*, but I read up on it and it's not as I remember it and it's just unnecessarily complicated, so I'll explain an expanded DFS. It takes a little longer to find the answer, than A* or BFS would, but for reasonably-sized grids it's not noticeable.
The idea of the algorithm is to map the entire grid with distances to Goal and then walk from start to goal following decreasing distance.
First you will need to use int[][] array instead of char of the previous case. That is because you need to store distances, which char can too to some extent, but also non-distance markers in the grid, like obstacles and such.
The idea of the algorithm is you call DFS(array, row, col, distance), where distance is calculated as distance of the Cell that calls DFS incremented by 1. Then DFS in the next cell checks if the distance it was passed is smaller than its current distance, if it is so, then there was a shorter path found to that cell and you need to recalculate all its neighbors too. Otherwise the new path is longer and you can disregard it. By calling DFS recursively you will gradually map the entire maze.
After that you will call another function FindPath(list, array, row, col), that will check the Cell it started in and add a direction to the cell with neighbor.distance == (this.distance - 1) to the list and then call FindPath on that neighbor until distance is 0, at which point it's the goal.
It should look smth like this:
# initialize grid with Integer.MAX_VALUE or just a big enough number
# for Cell in Cells -> put obstacles on Grid as -1,
# find the Start and Goal and record them somewhere
# DFS_plus(array, Goal, 0);
# FindPath(list, array, Start);
# Done
void DFS_plus(array, row, column, distance):
if(array[row][col] <= distance) return; # There already exists a shorter path there
# or it's an obstacle, we store obstacles as -1.
# It's smaller than any possible path and thus blocks further search
array[row][column] = distance; # update distance.
# If this happened its neighbors will need to be updated too.
#Check for edge cases, or initialize array to be bigger and place obstacles on edge#
DFS_plus(array, row-1, column, distance+1); # You just map everything, no returns expected
DFS_plus(); # For all 4 directions
FindPath(list, array, row, col){
if(array[row][col] == 0) return; # It's the Goal
if(array[row-1][col] == (array[row][col] - 1)){ # Check if Cell above is 1 closer to Goal
list.add("MoveUp"); # Add direction
FindPath(list, array, row-1, col); # Look for next direction
return; # You don't need to check other directions as path is guaranteed
if(){}; # Check other directions if Up wasn't the one
It is not much more complicated, but it gets you the shortest path. It's not the quickest way to find the shortest path, but it's relatively simple as any recursive algoithm.
I'm trying to figure out how to increase the speed of this algorithm. It works perfectly for two games (2-person games, CPU vs Human), but the problems is when I assign more than three piles (that contains a number of stones, so each player can pick up more than one), the computer player takes forever to compute the moves:
public Object[] minimax(int depth, int player) {
return new Object[]{get_default_input(1),1};
}else if(hasPlayer2Won(player)){
return new Object[]{get_default_input(1),-1};
List<T> movesAvailable = getNextStates();
return new Object[]{get_default_input(0), 0};
int min = Integer.MAX_VALUE;
int max = Integer.MIN_VALUE;
T computersMove = getNextStates().get(0);
int i = 0;
for (T move: movesAvailable) {
makeAMove(move, player);
Object[] result = minimax(depth + 1, player == G.PLAYER1 ? G.PLAYER2 : G.PLAYER1);
int currentScore = (int)result[1];
if(player == G.PLAYER1){
max = Math.max(currentScore, max);
if(currentScore >= 0 && depth == 0) {
computersMove = move;
if(currentScore == 1){
if(i==movesAvailable.size() - 1 && max < 0){
if (depth == 0){
computersMove = move;
min = Math.min(currentScore, min);
if(min == -1) {
return new Object[]{computersMove, player == G.PLAYER1 ? max: min};
I have sucessfully tested the following methods for improving minimax (used it to play Tic-Tac-Toe and Domineering):
Alpha beta pruning - used a special variant of this type of pruning, in conjunction with Lazy evaluation - basically instead of generating the whole tree I just generated an optimal move on each layer and kept Lazy holders for the other state-action pairs (applying the Lazy evaluation method, by making use of a supplier and calling it when a move different than the one I held was made).
Heuristic pruning - see the chapter on heuristics in that book. I basically only generated the first d branches of the tree and instead of having a deterministic outcome, I applied the heuristic function described in that book to the current state to determine a heuristic outcome. Whenever move (d+1) was made, I generated another branch using the same approach.
Here, d is the level that you choose (safest way is by testing)
Parallel computing also have a look at this, you may find it harder to implement but it pays off
First 2 options brought me a lot of computational time save, such that I was able to play Domineering optimally up to a 5x5 board and heuristically up to 10x10 (it can be better depending on how well you want it to play).
I'm writing code to automate simulate the actions of both Theseus and the Minoutaur as shown in this logic game; http://www.logicmazes.com/theseus.html
For each maze I provide it with the positions of the maze, and which positions are available eg from position 0 the next states are 1,2 or stay on 0. I run a QLearning instantiation which calculates the best path for theseus to escape the maze assuming no minotaur. then the minotaur is introduced. Theseus makes his first move towards the exit and is inevitably caught, resulting in reweighting of the best path. using maze 3 in the game as a test, this approach led to theseus moving up and down on the middle line indefinatly as this was the only moves that didnt get it killed.
As per a suggestion recieved here within the last few days i adjusted my code to consider state to be both the position of thesesus and the minotaur at a given time. when theseus would move the state would be added to a list of "visited states".By comparing the state resulting from the suggested move to the list of visited states, I am able to ensure that theseus would not make a move that would result in a previous state.
The problem is i need to be able to revisit in some cases. Eg using maze 3 as example and minotaur moving 2x for every theseus move.
Theseus move 4 -> 5, state added(t5, m1). mino move 1->5. Theseus caught, reset. 4-> 5 is a bad move so theseus moves 4->3, mino catches on his turn. now both(t5, m1) and (t3 m1) are on the visited list
what happens is all possible states from the initial state get added to the dont visit list, meaning that my code loops indefinitly and cannot provide a solution.
public void move()
int randomness =10;
State tempState = new State();
boolean rejectMove = true;
int keepCurrent = currentPosition;
int keepMinotaur = minotaurPosition;
previousPosition = currentPosition;
minotaurPosition = keepMinotaur;
currentPosition = keepCurrent;
rejectMove = false;
if (states.size() > 10)
if(this.policy(currentPosition) == this.minotaurPosition )
randomness = 100;
if(Math.random()*100 <= randomness)
System.out.println("Random move");
int[] actionsFromState = actions[currentPosition];
int max = actionsFromState.length;
Random r = new Random();
int s = r.nextInt(max);
previousPosition = currentPosition;
currentPosition = actions[currentPosition][s];
previousPosition = currentPosition;
currentPosition = policy(currentPosition);
tempState.setAttributes(minotaurPosition, currentPosition);
randomness = 10;
for(int i=0; i<states.size(); i++)
if(states.get(i).getMinotaurPosition() == tempState.getMinotaurPosition() && states.get(i).theseusPosition == tempState.getTheseusPosition())
rejectMove = true;
while(rejectMove == true);
above is the move method of theseus; showing it occasionally suggesting a random move
The problem here is a discrepancy between the "never visit a state you've previously been in" approach and your "reinforcement learning" approach. When I recommended the "never visit a state you've previously been in" approach, I was making the assumption that you were using backtracking: once Theseus got caught, you would unwind the stack to the last place where he made an unforced choice, and then try a different option. (That is, I assumed you were using a simple depth-first-search of the state-space.) In that sort of approach, there's never any reason to visit a state you've previously visited.
For your "reinforcement learning" approach, where you're completely resetting the maze every time Theseus gets caught, you'll need to change that. I suppose you can change the "never visit a state you've previously been in" rule to a two-pronged rule:
never visit a state you've been in during this run of the maze. (This is to prevent infinite loops.)
disprefer visiting a state you've been in during a run of the maze where Theseus got caught. (This is the "learning" part: if a choice has previously worked out poorly, it should be made less often.)
For what is worth, the simplest way to solve this problem optimally is to use ALPHA-BETA, which is a search algorithm for deterministic two-player games (like tic-tac-toe, checkers, chess). Here's a summary of how to implement it for your case:
Create a class that represents the current state of the game, which
should include: Thesesus's position, the Minoutaur's position and
whose turn is it. Say you call this class GameState
Create a heuristic function that takes an instance of GameState as paraemter, and returns a double that's calculated as follows:
Let Dt be the Manhattan distance (number of squares) that Theseus is from the exit.
Let Dm be the Manhattan distance (number of squares) that the Minotaur is from Theseus.
Let T be 1 if it's Theseus turn and -1 if it's the Minotaur's.
If Dm is not zero and Dt is not zero, return Dm + (Dt/2) * T
If Dm is zero, return -Infinity * T
If Dt is zero, return Infinity * T
The heuristic function above returns the value that Wikipedia refers to as "the heuristic value of node" for a given GameState (node) in the pseudocode of the algorithm.
You now have all the elements to code it in Java.
I am programming an AI for a chess-like game, based on two types of pieces on a 8 x 8 grid.
I want to build a kind of minmax tree, which represents each possible move in a game, played by white players in first, and by black players in second.
I have this generate() method which is call recursively. I need to be able to display about 8 levels of possible moves. Without optimization, this three has 8^8 leafs.
I implemented a simple system which determinate if a grid has actually ever been calculated and if its the case, system just points a child to the ever-calculated child reference.
I don't know if my explanations are clear, I will join a part of code that you should be able to understand.
The problem is that actually, I am able to generate about 3 or 4 levels of all possibilities. I am far of 8.
I would like to be able to calculate it in less than 5 seconds..
So guys, do you see a solution for optimize my algorithm ?
This is the generate function:
leftDiagonalMove(), rightDiagonalMove() and frontMove() return false if a move is illegal or move the piece in the grid and return true, if the move is legal.
clone() creates a new instance with the same properties of it's "parent" and backMove() just step back to last Move.
public void generate(Node root, boolean white, int index) {
Grid grid = root.getGrid();
Stack<Piece> whitePieces = grid.getPiecesByColor(WHITE);
Stack<Piece> blackPieces = grid.getPiecesByColor(BLACK);
Node node;
String serial = "";
// white loop
for (int i = 0; i < whitePieces.size() && white; i++) {
Piece wPiece = whitePieces.get(i);
if (grid.leftDiagonalMove(wPiece)) {
serial = grid.getSerial();
node = new Node(grid.clone());
root.addChild(node); // add modified grid
allGrids.put(serial, node);
if (index < 5 && grid.getPosition(wPiece).x > 0)
generate(node, !white, index + 1);
actualGrid.backMove(); // back step to initial grid
if (grid.frontMove(wPiece)) {
// same code as leftMove
if (grid.rightDiagonalMove(wPiece)) {
// same code as leftMove
// black loop
for (int i = 0; i < blackPieces.size() && !white; i++) {
Piece bPiece = blackPieces.get(i);
if (grid.leftDiagonalMove(bPiece)) {
// same code as white loop and replacing wPiece by bPiece
if (grid.frontMove(bPiece)) {
// same code as white loop and replacing wPiece by bPiece
if (grid.rightDiagonalMove(bPiece)) {
// same code as white loop and replacing wPiece by bPiece
You need to use something called AlphaBeta pruning on your generated MinMax trees of moves. More on this here:
Basically you do one level of branches and then using pruning you eliminate bad branches early. Then from the non eliminated branches you calculate (for each) another level. You prune again until you reach a desired depth.
Here are a few more links for you to read up on minmax:
1. http://en.wikipedia.org/wiki/Minimax
2. MinMax trees - when Min can win in two steps
This one is on optimizing pruning for chess games:
1. http://en.wikipedia.org/wiki/Alpha-beta_pruning#Heuristic_improvements
2. http://en.wikipedia.org/wiki/Refutation_table#Related_techniques
I don't understand why you are using Stacks when you are doing random access to the elements. A a low level you would get an improvement by using a Piece[] array instead.