I have been tasked to write a function that will find the best move for the computer to make for part of a backtracking algorithm. My solution finds a winnable answer but not a best answer. I am having trouble figuring out a way to keep a value assigned to the different options that won't get reset during the next recursive call. So if it goes through moves 1,2,3,4 and 2 and 3 both lead to a winnable solution it will take 3 and not 2 even if 2 would be the better choice. I can see why in my code this happens but I can't seem to think through how to fix it. I tried with the wins and total wins variable but this doesn't seem to be working. So once again the function works to find a winnable avenue but won't always pick the best of the winnable moves. Any help would be much appreciated
Move bestMove = null;
int totalwins= 0;
public Move findbest(Game g) throws GameException {
int wins = 0;
PlayerNumber side = g.SECOND_PLAYER;
PlayerNumber opp = g.FIRST_PLAYER;
Iterator<Move> moves = g.getMoves();
while(moves.hasNext()){
Move m = moves.next();
//System.out.println(m + " Totalwins " + totalwins);
Game g1 = g.copy();
g1.make(m);
//System.out.println("Turn: " + g.whoseTurn());
if(!g1.isGameOver()){
bestMove = findbest(g1);
}else{
if(g1.winner() == side ){
bestMove = m;
wins++;
}else if(g1.winner() == opp){
wins--;
}
if(wins > totalwins){
totalwins += wins;
bestMove = m;
}
}
if(bestMove == null){//saftey so it won't return a null if there is no winnable move.
bestMove = m;
}
}
//System.out.println("Totalwins = " + totalwins);
return bestMove;
}
As stated in the comments, you need to have some sort of rating system to determine which move really is the best.
Then, make a global variable Move bestMove and, instead of having findBest return the "best move", simply have it check to see if the current move is a winning one and, if so, also check to see if its rating is better than that of the current bestMove. If both of these conditions are true, then assign the current move to bestMove.
Related
public static int score(int[][] array, int win, int turn) {
int score = 0;
if (GamePrinciples.gameEnd(array, win)) {
if (GamePrinciples.draw(array)) {
score = 0;
} else if (GamePrinciples.winningBoard(array, win)[0] == 1) {
score = 1;
} else {
score = -1;
}
} else {
for (int[][] i : children(array, win, turn)) {
score += score(i, win, GamePrinciples.nextPlayer(turn));
}
}
return score;
}
briefly this program is part of my minimax algorithm. So the problem is that I get a stack over flow. Where am I going wrong?
if an array is in ending mode then if it is a draw it gives a score of zero if player one wins then a score of one and if player two wins it gives a score of two.
if the array is however not in the ending state we get the children of the array (immediate children that is the boards that result from the current board with only one move). The score of the board will be the sum of the score of each of its children. The logic seems okay and the other methods such as children, nextPlayer, winningBoard, draw all work fine with testing. So I am guessing there is problem with this kind of recursive implementation. Can anyone help? Thanks in advance
Your code seems wrong in the loop:
for (int[][] i : children(array, win, turn)) {
I haven’t tested, but you should call the method children() outside the for.
By calling the method within the for clause, you are always returning the initial array instead of iterating through it.
So try putting the children() method return to a variable and iterate through this variable.
Something like:
… c = children(…)
for(int[][] i : c) {
…
I'm trying to figure out how to increase the speed of this algorithm. It works perfectly for two games (2-person games, CPU vs Human), but the problems is when I assign more than three piles (that contains a number of stones, so each player can pick up more than one), the computer player takes forever to compute the moves:
public Object[] minimax(int depth, int player) {
if(hasPlayer1Won(player)){
return new Object[]{get_default_input(1),1};
}else if(hasPlayer2Won(player)){
return new Object[]{get_default_input(1),-1};
}
List<T> movesAvailable = getNextStates();
if(movesAvailable.isEmpty()){
return new Object[]{get_default_input(0), 0};
}
int min = Integer.MAX_VALUE;
int max = Integer.MIN_VALUE;
T computersMove = getNextStates().get(0);
int i = 0;
for (T move: movesAvailable) {
makeAMove(move, player);
Object[] result = minimax(depth + 1, player == G.PLAYER1 ? G.PLAYER2 : G.PLAYER1);
int currentScore = (int)result[1];
if(player == G.PLAYER1){
max = Math.max(currentScore, max);
if(currentScore >= 0 && depth == 0) {
computersMove = move;
}
if(currentScore == 1){
resetMove(move);
break;
}
if(i==movesAvailable.size() - 1 && max < 0){
if (depth == 0){
computersMove = move;
}
}
}else{
min = Math.min(currentScore, min);
if(min == -1) {
resetMove(move);
break;
}
}
i++;
resetMove(move);
}
return new Object[]{computersMove, player == G.PLAYER1 ? max: min};
}
I have sucessfully tested the following methods for improving minimax (used it to play Tic-Tac-Toe and Domineering):
Alpha beta pruning - used a special variant of this type of pruning, in conjunction with Lazy evaluation - basically instead of generating the whole tree I just generated an optimal move on each layer and kept Lazy holders for the other state-action pairs (applying the Lazy evaluation method, by making use of a supplier and calling it when a move different than the one I held was made).
Heuristic pruning - see the chapter on heuristics in that book. I basically only generated the first d branches of the tree and instead of having a deterministic outcome, I applied the heuristic function described in that book to the current state to determine a heuristic outcome. Whenever move (d+1) was made, I generated another branch using the same approach.
Here, d is the level that you choose (safest way is by testing)
Parallel computing also have a look at this, you may find it harder to implement but it pays off
First 2 options brought me a lot of computational time save, such that I was able to play Domineering optimally up to a 5x5 board and heuristically up to 10x10 (it can be better depending on how well you want it to play).
I have looked everywhere for answers for fixing my code but after long hours spent trying to debug it I find myself hopelessly stuck. The problem is that my minimax function will not return the correct values for the best possible move, I even attempted to fix it by storing the best first moves (when depth = 0), but if the solution is not obvious, then the algorithm fails horribly. I also tried modifying the return values from the base cases in order to prioritize earlier wins, but this didn't solve the problem.
Currently I am testing the function on a TicTacToe board and the helper classes (Eg getMoves() or getWinner are working properly), I know my style is not the most efficient but I needed the code to be fairly explicit.
By adding a bunch of print statements I realized that under some circumstances my bestFinalMoves ArrayList was not modified, so this may be related to the issue. Another related problem is that unless the algorithm finds a direct win (in the next move), then instead of choosing a move that may lead to a future win or tie by blocking a square that leads to an immediate block, it just yields the space for the minimizing player to win.
For example on the board:
aBoard= new int[][] {
{0,1,0}, // 1 is MAX (AI), -1 is MIN (Human)
{-1,0,0},
{-1,0,0}
};
Yields the incorrect result of 2,0, where it is obvious that it should be 0,0, so that it blocks the win for the minimizing player, and the bestFinalMoves ArrayList is empty.
private result miniMaxEnd2(Board tmpGame, int depth){
String winner = tmpGame.whoWon();
ArrayList<Move> myMoves = tmpGame.getMoves();
if (winner == 'computer'){ //Base Cases
return new result(1000);
}else if (winner == 'human'){
return new result(-1000);
}
else if (winner == 'tie'){
return new result(0);
}
if (tmpGame.ComputerTurn) {//MAX
bestScore = -99999;
for (Move m : tmpGame.getMoves()){
Board newGame = new Board(tmpGame,!tmpGame.ComputerTurn, m);
result aScore = miniMaxEnd2(newGame, depth+1);
if (aScore.score > bestScore) {
bestScore = aScore.score;
bestMove = m;
if (depth == 0) {
bestFinalMoves.add(m);
}
}
}
return new result(bestScore, bestMove);
} else {//MIN
bestScore = 99999;
for (Move m : tmpGame.getMoves()) {
Board newGame = new Board(tmpGame,!tmpGame.ComputerTurn, m);
result aScore = miniMaxEnd2(newGame, depth + 1);
if (aScore.score < bestScore) {
bestScore = aScore.score;
bestMove = m;
}
}
return new result(bestScore,bestMove);
}
}
I know this was a long post, but I really appreciate your help. The full code can be accessed at https://github.com/serch037/UTC_Connect
The bestScore and bestMove variables must be declared as local variables inside the miniMaxEnd2 method for this logic to work properly.
Those variables' values are being replaced by the recursive call.
I have this and its in an action listener. If i lose it will display 1. It then resets the board and 1 will stay there which is what I want.. Then if i lose again it doesn't change the 1 to a 2. Same problems goes for wins. I hope this is enough information.
if (game.getGameStatus() == GameStatus.Lost) {
displayBoard();
JOptionPane.showMessageDialog(null, "You Lose \nThe game will reset");
//exposeMines = false;
game.reset();
displayBoard();
int losses = 0;
losses = losses + 1;
String lost = Integer.toString(losses);
jtfLosses.setText(String.format(lost));
}
You are defining "int losses = 0" everytime you enter your if-statement, it is getting overwritten by 0, then you add 1 again, which results in 1 everytime.
so defining
int losses = 0;
before
if (game.getGameStatus() == GameStatus.Lost) {
...
would fix the problem, like this:
int losses = 0;
if (game.getGameStatus() == GameStatus.Lost) {
...
but be careful, this solution is just to show you, what the problem is, to actually make it correct, you should use a member-variable, and not pollute the global name-space
You overwrite the number of losses with 0 every time, then increment that by one. Initialize the variable losses outside of this method and don't do it every time you want to increment it.
I suggest you to use properties file to store this information like here
you use
PlayerLoose = 2
PlayerWin = 3
than you can use
int loose =(prop.getProperty("PlayerLoose"));
loose++;
prop.setProperty("PlayerLoose", loose+"");
String lost = Integer.toString(loose);
jtfLosses.setText(String.format(lost));
I'm writing code to automate simulate the actions of both Theseus and the Minoutaur as shown in this logic game; http://www.logicmazes.com/theseus.html
For each maze I provide it with the positions of the maze, and which positions are available eg from position 0 the next states are 1,2 or stay on 0. I run a QLearning instantiation which calculates the best path for theseus to escape the maze assuming no minotaur. then the minotaur is introduced. Theseus makes his first move towards the exit and is inevitably caught, resulting in reweighting of the best path. using maze 3 in the game as a test, this approach led to theseus moving up and down on the middle line indefinatly as this was the only moves that didnt get it killed.
As per a suggestion recieved here within the last few days i adjusted my code to consider state to be both the position of thesesus and the minotaur at a given time. when theseus would move the state would be added to a list of "visited states".By comparing the state resulting from the suggested move to the list of visited states, I am able to ensure that theseus would not make a move that would result in a previous state.
The problem is i need to be able to revisit in some cases. Eg using maze 3 as example and minotaur moving 2x for every theseus move.
Theseus move 4 -> 5, state added(t5, m1). mino move 1->5. Theseus caught, reset. 4-> 5 is a bad move so theseus moves 4->3, mino catches on his turn. now both(t5, m1) and (t3 m1) are on the visited list
what happens is all possible states from the initial state get added to the dont visit list, meaning that my code loops indefinitly and cannot provide a solution.
public void move()
{
int randomness =10;
State tempState = new State();
boolean rejectMove = true;
int keepCurrent = currentPosition;
int keepMinotaur = minotaurPosition;
previousPosition = currentPosition;
do
{
minotaurPosition = keepMinotaur;
currentPosition = keepCurrent;
rejectMove = false;
if (states.size() > 10)
{
states.clear();
}
if(this.policy(currentPosition) == this.minotaurPosition )
{
randomness = 100;
}
if(Math.random()*100 <= randomness)
{
System.out.println("Random move");
int[] actionsFromState = actions[currentPosition];
int max = actionsFromState.length;
Random r = new Random();
int s = r.nextInt(max);
previousPosition = currentPosition;
currentPosition = actions[currentPosition][s];
}
else
{
previousPosition = currentPosition;
currentPosition = policy(currentPosition);
}
tempState.setAttributes(minotaurPosition, currentPosition);
randomness = 10;
for(int i=0; i<states.size(); i++)
{
if(states.get(i).getMinotaurPosition() == tempState.getMinotaurPosition() && states.get(i).theseusPosition == tempState.getTheseusPosition())
{
rejectMove = true;
changeReward(100);
}
}
}
while(rejectMove == true);
states.add(tempState);
}
above is the move method of theseus; showing it occasionally suggesting a random move
The problem here is a discrepancy between the "never visit a state you've previously been in" approach and your "reinforcement learning" approach. When I recommended the "never visit a state you've previously been in" approach, I was making the assumption that you were using backtracking: once Theseus got caught, you would unwind the stack to the last place where he made an unforced choice, and then try a different option. (That is, I assumed you were using a simple depth-first-search of the state-space.) In that sort of approach, there's never any reason to visit a state you've previously visited.
For your "reinforcement learning" approach, where you're completely resetting the maze every time Theseus gets caught, you'll need to change that. I suppose you can change the "never visit a state you've previously been in" rule to a two-pronged rule:
never visit a state you've been in during this run of the maze. (This is to prevent infinite loops.)
disprefer visiting a state you've been in during a run of the maze where Theseus got caught. (This is the "learning" part: if a choice has previously worked out poorly, it should be made less often.)
For what is worth, the simplest way to solve this problem optimally is to use ALPHA-BETA, which is a search algorithm for deterministic two-player games (like tic-tac-toe, checkers, chess). Here's a summary of how to implement it for your case:
Create a class that represents the current state of the game, which
should include: Thesesus's position, the Minoutaur's position and
whose turn is it. Say you call this class GameState
Create a heuristic function that takes an instance of GameState as paraemter, and returns a double that's calculated as follows:
Let Dt be the Manhattan distance (number of squares) that Theseus is from the exit.
Let Dm be the Manhattan distance (number of squares) that the Minotaur is from Theseus.
Let T be 1 if it's Theseus turn and -1 if it's the Minotaur's.
If Dm is not zero and Dt is not zero, return Dm + (Dt/2) * T
If Dm is zero, return -Infinity * T
If Dt is zero, return Infinity * T
The heuristic function above returns the value that Wikipedia refers to as "the heuristic value of node" for a given GameState (node) in the pseudocode of the algorithm.
You now have all the elements to code it in Java.