Java maze solving and reinforcement learning

Java maze solving and reinforcement learning - java

I'm writing code to automate simulate the actions of both Theseus and the Minoutaur as shown in this logic game; http://www.logicmazes.com/theseus.html
For each maze I provide it with the positions of the maze, and which positions are available eg from position 0 the next states are 1,2 or stay on 0. I run a QLearning instantiation which calculates the best path for theseus to escape the maze assuming no minotaur. then the minotaur is introduced. Theseus makes his first move towards the exit and is inevitably caught, resulting in reweighting of the best path. using maze 3 in the game as a test, this approach led to theseus moving up and down on the middle line indefinatly as this was the only moves that didnt get it killed.
As per a suggestion recieved here within the last few days i adjusted my code to consider state to be both the position of thesesus and the minotaur at a given time. when theseus would move the state would be added to a list of "visited states".By comparing the state resulting from the suggested move to the list of visited states, I am able to ensure that theseus would not make a move that would result in a previous state.
The problem is i need to be able to revisit in some cases. Eg using maze 3 as example and minotaur moving 2x for every theseus move.
Theseus move 4 -> 5, state added(t5, m1). mino move 1->5. Theseus caught, reset. 4-> 5 is a bad move so theseus moves 4->3, mino catches on his turn. now both(t5, m1) and (t3 m1) are on the visited list
what happens is all possible states from the initial state get added to the dont visit list, meaning that my code loops indefinitly and cannot provide a solution.
public void move()
{
int randomness =10;
State tempState = new State();
boolean rejectMove = true;
int keepCurrent = currentPosition;
int keepMinotaur = minotaurPosition;
previousPosition = currentPosition;
do
{
minotaurPosition = keepMinotaur;
currentPosition = keepCurrent;
rejectMove = false;
if (states.size() > 10)
{
states.clear();
}
if(this.policy(currentPosition) == this.minotaurPosition )
{
randomness = 100;
}
if(Math.random()*100 <= randomness)
{
System.out.println("Random move");
int[] actionsFromState = actions[currentPosition];
int max = actionsFromState.length;
Random r = new Random();
int s = r.nextInt(max);
previousPosition = currentPosition;
currentPosition = actions[currentPosition][s];
}
else
{
previousPosition = currentPosition;
currentPosition = policy(currentPosition);
}
tempState.setAttributes(minotaurPosition, currentPosition);
randomness = 10;
for(int i=0; i<states.size(); i++)
{
if(states.get(i).getMinotaurPosition() == tempState.getMinotaurPosition() && states.get(i).theseusPosition == tempState.getTheseusPosition())
{
rejectMove = true;
changeReward(100);
}
}
}
while(rejectMove == true);
states.add(tempState);
}
above is the move method of theseus; showing it occasionally suggesting a random move

The problem here is a discrepancy between the "never visit a state you've previously been in" approach and your "reinforcement learning" approach. When I recommended the "never visit a state you've previously been in" approach, I was making the assumption that you were using backtracking: once Theseus got caught, you would unwind the stack to the last place where he made an unforced choice, and then try a different option. (That is, I assumed you were using a simple depth-first-search of the state-space.) In that sort of approach, there's never any reason to visit a state you've previously visited.
For your "reinforcement learning" approach, where you're completely resetting the maze every time Theseus gets caught, you'll need to change that. I suppose you can change the "never visit a state you've previously been in" rule to a two-pronged rule:
never visit a state you've been in during this run of the maze. (This is to prevent infinite loops.)
disprefer visiting a state you've been in during a run of the maze where Theseus got caught. (This is the "learning" part: if a choice has previously worked out poorly, it should be made less often.)

For what is worth, the simplest way to solve this problem optimally is to use ALPHA-BETA, which is a search algorithm for deterministic two-player games (like tic-tac-toe, checkers, chess). Here's a summary of how to implement it for your case:
Create a class that represents the current state of the game, which
should include: Thesesus's position, the Minoutaur's position and
whose turn is it. Say you call this class GameState
Create a heuristic function that takes an instance of GameState as paraemter, and returns a double that's calculated as follows:
Let Dt be the Manhattan distance (number of squares) that Theseus is from the exit.
Let Dm be the Manhattan distance (number of squares) that the Minotaur is from Theseus.
Let T be 1 if it's Theseus turn and -1 if it's the Minotaur's.
If Dm is not zero and Dt is not zero, return Dm + (Dt/2) * T
If Dm is zero, return -Infinity * T
If Dt is zero, return Infinity * T
The heuristic function above returns the value that Wikipedia refers to as "the heuristic value of node" for a given GameState (node) in the pseudocode of the algorithm.
You now have all the elements to code it in Java.

Related

Prevent Depth First Search Algorithm from getting stuck in an infinite loop ,8 Puzzle

So my code works for basic 8 Puzzle problems, but when I test it with harder puzzle configurations it runs into an infinite loop. Can someone please edit my code to prevent this from happening. Note that I have included code that prevents the loops or cycles. I tried including the the iterative depth first search technique, but that too did not work. Can someone please review my code.
/** Implementation for the Depth first search algorithm */
static boolean depthFirstSearch(String start, String out ){
LinkedList<String> open = new LinkedList<String>();
open.add(start);
Set<String> visitedStates = new HashSet<String>(); // to prevent the cyle or loop
LinkedList<String> closed = new LinkedList<String>();
boolean isGoalState= false;
while((!open.isEmpty()) && (isGoalState != true) ){
String x = open.removeFirst();
System.out.println(printPuzzle(x)+"\n\n");
jtr.append(printPuzzle(x) +"\n\n");
if(x.equals(out)){ // if x is the goal
isGoalState = true;
break;
}
else{
// generate children of x
LinkedList<String> children = getChildren(x);
closed.add(x); // put x on closed
open.remove(x); // since x is now in closed, take it out from open
//eliminate the children of X if its on open or closed ?
for(int i=0; i< children.size(); i++){
if(open.contains(children.get(i))){
children.remove(children.get(i));
}
else if(closed.contains(children.get(i))){
children.remove(children.get(i));
}
}
// put remaining children on left end of open
for(int i= children.size()-1 ; i>=0 ; i--){
if ( !visitedStates.contains(children.get(i))) { // check if state already visited
open.addFirst(children.get(i)); // add last child first, and so on
visitedStates.add(children.get(i));
}
}
}
}
return true;
}

I would suggest putting the positions that you are considering into a https://docs.oracle.com/javase/7/docs/api/java/util/PriorityQueue.html with a priority based on how close they are to being solved.
So what you do is take the closest position off of the queue, and add in all of the one move options from there that haven't yet been processed. Then repeat. You'll spent most of your time exploring possibilities that are close to solved instead of just moving randomly forever.
Now your question is "how close are we to solving it". One approach is to take the sum of all of the taxicab distances between where squares are and where they need to be. A better heuristic may be to give more weight to getting squares away from the corner in place first. If you get it right, changing your heuristic should be easy.

Rush Hour solver running into infinite loop any non-trivial puzzle

So I'm writing a Rush Hour solver in Java, which is meant to be able to solve the configurations here. However, even the simplest puzzle from that page results in the solver running infinitely and eventually running out of memory. I'm using a breadth first search to work my way through all possible moves arising from each board state (using a HashSet to ensure I'm not repeating myself), and mapping each state to the move that got it there so I can backtrack through them later.
The thing is, I've tried it with more trivial puzzles that I've come up with myself, and it's able to solve them (albeit slowly).
Is there anything blatantly wrong with how I'm approaching this problem? I can put up some code from the relevant classes as well if I need to, but I've tested them pretty thoroughly and I'm pretty sure the problem lies somewhere in the code below. My gut says it's something to do with the HashSet and making sure I'm not repeating myself (since the Queue's size regularly reaches the hundred thousands).
Any suggestions?
// Start at the original configuration
queue.add(originalBoard);
// We add this to our map, but getting here did not require a move, so we use
// a dummy move as a placeholder move
previous.put(originalBoard, new Move(-1, -1, "up"));
// Breadth first search through all possible configurations
while(!queue.isEmpty()) {
// Dequeue next board and make sure it is unique
Board currentBoard = queue.poll();
if (currentBoard == null) continue;
if (seen.contains(currentBoard)) continue;
seen.add(currentBoard);
// Check if we've won
if (currentBoard.hasWon()) {
System.out.println("We won!");
currentBoard.renderBoard();
return solved(currentBoard);
}
// Get a list of all possible moves for the current board state
ArrayList<Move> possibleMoves = currentBoard.allPossibleMoves();
// Check if one of these moves is the winning move
for (Move move : possibleMoves) {
Board newBoard = move.execute(currentBoard);
// We don't need to enqueue boards we've already seen
if (seen.contains(newBoard)) continue;
queue.add(newBoard);
// Map this board to the move that got it there
previous.put(newBoard, move);
}
}
As requested, here are my initialisations of the HashSet (they're class level variables):
private static HashSet<Board> seen = new HashSet<>();
And my Board.equals() method:
#Override
public boolean equals (Object b) {
Board otherBoard = (Board) b;
boolean equal = false;
if (this.M == otherBoard.getM() && this.N == otherBoard.getN()) {
equal = true;
// Each board has an ArrayList of Car objects, and boards are only
// considered equal if they contain the exact same cars
for (Car car : this.cars) {
if (otherBoard.getCar(car.getPosition()) == null) {
equal = false;
}
}
}
System.out.println(equal);
return equal;
}

You must implement Board.hashCode() to override the default Object-based version, in such a way that, per its contract, any two equal Board objects have the same hash code. If you do not, then your seen set does not in fact accomplish anything at all for you.
On another issue, I suspect that the way you're checking the boards' cars is not fully correct. If it works the way I think it does, it would consider these two boards to be equal:
. = empty space
* = part of a car
......
.**.*.
....*.
.*....
.*.**.
......
......
.*..**
.*....
......
.**.*.
....*.

Java A* Implementation Issues

I have written an implementation of the A* algorithm, taken mainly from This wiki page, however I have a major problem; in that I believe I am visiting way too many nodes while calculating a route therefore ruining my performance. I've been trying to figure out the issue for a few days and I can't see what's wrong. Please note, all my data structures are self implemented however I've tested them and believe they're not the issue.
I've included my Priority Queue implementation just in case.
closedVertices is a Hash map of Vertices.
private Vertex routeCalculation(Vertex startLocation, Vertex endLocation, int routetype)
{
Vertex vertexNeighbour;
pqOpen.AddItem(startLocation);
while (!(pqOpen.IsEmpty()))
{
tempVertex = pqOpen.GetNextItem();
for (int i = 0; i < tempVertex.neighbors.GetNoOfItems(); i++) //for each neighbor of tempVertex
{
currentRoad = tempVertex.neighbors.GetItem(i);
currentRoad.visited = true;
vertexNeighbour = allVertices.GetNewValue(currentRoad.toid);
//if the neighbor is in closed set, move to next neighbor
checkClosed();
nodesVisited++;
setG_Score();
//checks if neighbor is in open set
findNeighbour();
//if neighbour is not in open set
if (!foundNeighbor || temp_g_score < vertexNeighbour.getTentativeDistance())
{
vertexNeighbour.setTentativeDistance(temp_g_score);
//calculate H once, store it and then do an if statement to see if it's been used before - if true, grab from memory, else calculate.
if (vertexNeighbour.visited == false)
vertexNeighbour.setH(heuristic(endLocation, vertexNeighbour));
vertexNeighbour.setF(vertexNeighbour.getH() + vertexNeighbour.getTentativeDistance());
// if neighbor isn't in open set, add it to open set
if (!(foundNeighbor))
{
pqOpen.AddItem(vertexNeighbour);
}
else
{
pqOpen.siftUp(foundNeighbourIndex);
}
}
}
}
}
return null;
}
Can anyone see where I may be exploring too many nodes?
Also, I've attempted to implement a way to calculate the quickest (timed) route, by modifying F by the speed of the road. Am I right in saying this the correct way to do it?
(I divided the speed of the road by 100 because it was taking a long time to execute otherwise).

I found my own error; I had implemented the way in which I calculate the heuristic for each node wrong - I had an IF statement to see if the H had already been calculated however I had done this wrong and therefore it never actually calculated the H for some nodes; resulting in excessive node exploration. I simply removed the line: if (vertexNeighbour.visited == false) and now I have perfect calculations.
However I am still trying to figure out how to calculate the fastest route in terms of time.

MiniMax Implementation

I am trying to write a small AI algorithm in Java implementing the miniMax algorithm.
The game upon which this is based is a two-player game where both players make one move per turn, and each board position resulting in each player having a score. The "quality" of a position for player X is evaluated by subtracting the opponent's score from player X's score for that position. Each move is represented by an integer (i.e. Move one is made by inputting 1, move two by inputting 2 etc)
I understand that miniMax should be implemented using recursion. At the moment I have:
An evaluate() method, which takes as parameters an object representing the board state (Ie "BoardState" object and a boolean called "max" (the signature would be evaluate(BoardState myBoard, boolean max)).
max is true when it is player X's turn. Given a board position, it will evaluate all possible moves and return that which is most beneficial for player X. If it is the opponent's turn, max will be false and the method will return the move which is LEAST beneficial for player X (ie: most beneficial for player y)
However, I am having difficulties writing the actual miniMax method. My general structure would be something like:
public int miniMax(GameState myGameState, int depth)
Whereby I submit the initial gameState and the "depth" I want it to look into.
I would then be having something like:
int finalMove = 0;
while(currentDepth < depth) {
GameState tmp = myGameState.copy();
int finalMove = evaluate(tmp, true or false);
iniMax(tmp.makeMove(finalMove));
}
return finalMove;
Would this sound like a plausible implementation? Any suggestions? :)
Thanks!

that wont work.
details :
it will cause infinite loop. currentdepth never gets incremented
your definition of evaluation seems to be different than the majority. Normally evaluation function will return the predicted value of the game state. Isnt your definition of evaluate function is just the same as what the minimax function do ?
is miniMax and MiniMax different? because if you meant recursion then you need to pass depth-1 when calling the next miniMax
the idea of minimax is depth first search. and only evaluate leaf nodes(nodes with maximum depth or nodes that is a win or tie) and pick one that is max if the current player is the maximizing one and pick min if the current player is the minimizing one.
this is how i implemented it :
function miniMax(node, depth)
if(depth == 0) then --leaf node
local ret = evaluate(node.state)
return ret
else -- winning node
local winner = whoWin(node.state)
if(winner == 1) then -- P1
return math.huge
elseif(winner == -1) then -- P2
return math.huge*-1
end
end
local num_of_moves = getNumberOfMoves(node.state)
local v_t = nil
local best_move_index = nil
if(getTurn(node.state) == 1) then -- maximizing player
local v = -math.huge
for i=0, num_of_moves-1 do
local child_node = simulate(node.state, i)) -- simulate move number i
v_t = miniMax(child_node, depth-1)
if(v_t > v) then
v = v_t
best_move_index = i
end
end
if(best_move_index == nil) then best_move_index = random(0, num_of_moves-1) end
return v, best_move_index
else -- minimizing player
local v = math.huge
for i=0, num_of_moves-1 do
local child_node = simulate(node.state, i)
v_t = miniMax(child_node, depth-1)
if(v_t < v) then
v = v_t
best_move_index = i
end
end
if(best_move_index == nil) then best_move_index = random(0, num_of_moves-1) end
return v, best_move_index
end
end
Note:
return v, best_move_index means returning two values of v and best_move_index(above code is in lua and lua can return multiple values)
evaluate function returns the same score for both players(ie game state A in point of view P1 is scored 23, and in point of view P2 is also scored 23)
this algo will only work if the two player run alternately(no player can run two moves consecutively), you can trick this restriction by giving the opponent one move, that is move PASS(skip his/her turn) if the other player need to move twice.
this minimax can be further optimized(sorted from the easiest one) :
alpha-beta pruning
iterative deepening
move ordering

I made an implementation of minimax in lua. I hope it helps give you an idea of how to tackle the algorithm form a Java perspective, the code should be quite similar mind you. It is designed for a game of tic-tac-toe.
--caller is the player who is using the minimax function
--initial state is the board from which the player must make a move
local function minimax(caller,inital_state)
local bestState = {}; --we use this to store the game state the the player will create
--this recurse function is really the 'minimax' algorithim
local function recurse(state,depth)
--childPlayer is the person who will have their turn in the current state's children
local ChildPlayer = getTurn(state)
--parentPlayer is the person who is looking at their children
local ParentPlayer = getPreviousTurn(state)
--this represents the worst case scenario for a player
local alpha = - (ChildPlayer == caller and 1 or -1 );
--we check for terminal conditions (leaf nodes) and return the appropriate objective value
if win(state) then
--return +,- inf depending on who called the 'minimax'
return ParentPlayer == caller and 1 or -1;
elseif tie(state) then
--if it's a tie then the value is 0 (neither win or loss)
return 0;
else
--this will return a list of child states FROM the current state
children = getChildrenStates(state,ChildPlayer)
--enumerate over each child
for _,child in ipairs(children) do
--find out the child's objective value
beta = recurse(child,depth+1);
if ChildPlayer == caller then
--We want to maximize
if beta >= alpha then
alpha = beta
--if the depth is 0 then we can add the child state as the bestState (this will because the caller of minimax will always want to choose the GREATEST value on the root node)
if depth == 0 then
bestState = child;
end
end
--we want to MINIMIZE
elseif beta <= alpha then
alpha = beta;
end
end
end
--return a non-terminal nodes value (propagates values up the tree)
return alpha;
end
--start the 'minimax' function by calling recurse on the initial state
recurse(inital_state,0);
--return the best move
return bestState;
end

How can this Java tree be 1000 x faster?

I am programming an AI for a chess-like game, based on two types of pieces on a 8 x 8 grid.
I want to build a kind of minmax tree, which represents each possible move in a game, played by white players in first, and by black players in second.
I have this generate() method which is call recursively. I need to be able to display about 8 levels of possible moves. Without optimization, this three has 8^8 leafs.
I implemented a simple system which determinate if a grid has actually ever been calculated and if its the case, system just points a child to the ever-calculated child reference.
I don't know if my explanations are clear, I will join a part of code that you should be able to understand.
The problem is that actually, I am able to generate about 3 or 4 levels of all possibilities. I am far of 8.
I would like to be able to calculate it in less than 5 seconds..
So guys, do you see a solution for optimize my algorithm ?
This is the generate function:
leftDiagonalMove(), rightDiagonalMove() and frontMove() return false if a move is illegal or move the piece in the grid and return true, if the move is legal.
clone() creates a new instance with the same properties of it's "parent" and backMove() just step back to last Move.
public void generate(Node root, boolean white, int index) {
Grid grid = root.getGrid();
Stack<Piece> whitePieces = grid.getPiecesByColor(WHITE);
Stack<Piece> blackPieces = grid.getPiecesByColor(BLACK);
Node node;
String serial = "";
// white loop
for (int i = 0; i < whitePieces.size() && white; i++) {
Piece wPiece = whitePieces.get(i);
if (grid.leftDiagonalMove(wPiece)) {
serial = grid.getSerial();
if(!allGrids.containsKey(serial)){
node = new Node(grid.clone());
node.setMove(grid.getLastMove());
root.addChild(node); // add modified grid
allGrids.put(serial, node);
//actualGrid.display();
if (index < 5 && grid.getPosition(wPiece).x > 0)
generate(node, !white, index + 1);
actualGrid.backMove(); // back step to initial grid
}
else{
root.addChild(allGrids.get(serial));
}
}
if (grid.frontMove(wPiece)) {
// same code as leftMove
}
if (grid.rightDiagonalMove(wPiece)) {
// same code as leftMove
}
}
// black loop
for (int i = 0; i < blackPieces.size() && !white; i++) {
Piece bPiece = blackPieces.get(i);
if (grid.leftDiagonalMove(bPiece)) {
// same code as white loop and replacing wPiece by bPiece
}
if (grid.frontMove(bPiece)) {
// same code as white loop and replacing wPiece by bPiece
}
if (grid.rightDiagonalMove(bPiece)) {
// same code as white loop and replacing wPiece by bPiece
}
}
}

You need to use something called AlphaBeta pruning on your generated MinMax trees of moves. More on this here:
http://en.wikipedia.org/wiki/Alpha-beta_pruning
http://www.progtools.org/games/tutorials/ai_contest/minmax_contest.pdf
Basically you do one level of branches and then using pruning you eliminate bad branches early. Then from the non eliminated branches you calculate (for each) another level. You prune again until you reach a desired depth.
Here are a few more links for you to read up on minmax:
1. http://en.wikipedia.org/wiki/Minimax
2. MinMax trees - when Min can win in two steps
This one is on optimizing pruning for chess games:
1. http://en.wikipedia.org/wiki/Alpha-beta_pruning#Heuristic_improvements
2. http://en.wikipedia.org/wiki/Refutation_table#Related_techniques

I don't understand why you are using Stacks when you are doing random access to the elements. A a low level you would get an improvement by using a Piece[] array instead.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Java maze solving and reinforcement learning - java

Related

Prevent Depth First Search Algorithm from getting stuck in an infinite loop ,8 Puzzle

Rush Hour solver running into infinite loop any non-trivial puzzle

Java A* Implementation Issues

MiniMax Implementation

How can this Java tree be 1000 x faster?

Categories

Resources