Monte Carlo Tree Searching UCT implementation

Monte Carlo Tree Searching UCT implementation - java

Can you explain me how to build the tree?
I quite understood how the nodes are chosen, but a nicer explanation would really help me implementing this algorithm. I already have a board representing the game state, but I don't know (understand) how to generate the tree.
Can someone points me to a well commented implementation of the algorithm (I need to use it for AI)? Or better explanation/examples of it?
I didn't found a lot of resources on the net, this algorithm is rather new...

The best way to generate the tree is a series of random playouts. The trick is being able to balance between exploration and exploitation (this is where the UCT comes in). There are some good code samples and plenty of research paper references here : https://web.archive.org/web/20160308043415/http://mcts.ai:80/index.html
When I implemented the algorithm, I used random playouts until I hit an end point or termination state. I had a static evaluation function that would calculate the payoff at this point, then the score from this point is propagated back up the tree. Each player or "team" assumes that the other team will play the best move for themselves, and the worst move possible for their opponent.
I would also recommend checking out the papers by Chaslot and his phd thesis as well as some of the research that references his work (basically all the MCTS work since then).
For example: Player 1's first move could simulate 10 moves into the future alternating between player 1 moves and player 2 moves. Each time you must assume that the opposing player will try to minimize your score whilst maximizing their own score. There is an entire field based on this known as Game Theory. Once you simulate to the end of 10 games, you iterate from the start point again (because there is no point only simulating one set of decisions). Each of these branches of the tree must be scored where the score is propagated up the tree and the score represents the best possible payoff for the player doing the simulating assuming that the other player is also choosing the best moves for themselves.
MCTS consists of four strategic steps, repeated as long as there is time left. The steps are as follows.
In the selection step the tree is traversed from the
root node until we reach a node E, where we select a position that is not added to the tree yet.
Next, during the play-out step moves are played in self-play until the end of the game is reached. The result R of this “simulated” game is +1 in case of a win for Black (the first player in LOA), 0 in case of a draw, and −1 in case of a win for White.
Subsequently, in the expansion step children of E are added to the tree.
Finally, R is propagated back along the path from E to the root node in the backpropagation step. When time is up, the move played by the program is the child of the root with the highest value.
(This example is taken from this paper - PDF
www.ru.is/faculty/yngvi/pdf/WinandsBS08.pdf
Here are some implementations:
A list of libraries and games using some MCTS implementations
http://senseis.xmp.net/?MonteCarloTreeSearch
and a game independent open source UCT MCTS library called Fuego
http://fuego.sourceforge.net/fuego-doc-1.1/smartgame-doc/group__sguctgroup.html

From http://mcts.ai/code/index.html:
Below are links to some basic MCTS implementations in various
programming languages. The listings are shown with timing, testing
and debugging code removed for readability.
Java
Python

I wrote this one if you're interrested : https://github.com/avianey/mcts4j

Related

Utility function in minimax game with depth-limit

I'm making my own 5 in a row game where the board is bigger than 5x5, the size isn't decided but let's say 10x10. I'm designing this with a minimax algorithm and alfa-beta pruning. I've decided that a winning situation has the utility function of: (empty places + chosen place)*5, so if computer finds a win where there will be 2 empty places left then the value will be (2+1)*5 = 15. Same calculation for lose but times -5 instead of 5. For a draw it'll evaluate 0. Now, the issue that I'm writing this for is this: How do I determine the utility for an unfinished scenario where the depth limit is reached? Simply making it 0 isn't good enough, there must be some kind of "guess" or ranking.
An idea I had was to count all the X's in every row, every column and every possible diagonal and do empty places+chosen place times biggest found sequence of X's. The issue there is that the calculations take time and are a pain to write and then you'd also have to take into account the edges of the sequences found: are they bordered by O's? Should we count X's with empty gaps?
It seems awfully complicated and I wondered therefore if anyone has any good advise for me in regards to the unfinished scenarios. Thank you!

I have no idea what you are talking about in your first paragraph, a win should be scored as a win (minus depth to always find shortest win). If I understand you correctly you are looking to create an evaluation/heuristics function to be able to input a board state and output a static score for that state? Gomoku (5 in a row) is a popular game that has many engines, Google "evaluation function gomoku" and you will find lots of information and paper on the subject.
For a game like chess the evaluation is done e.g. by counting each sides material balance, piece positions, king safety etc. For X-in-a-row the usual approach is to give all possible scenarios a score and then add them up together. The scenarios for Gomoku are usually as follows:
FiveInRow (Win)
OpenFour
ClosedFour
OpenThree
ClosedThree
OpenTwo
ClosedTwo
Open means there are no blocking enemy pieces next to the friendly pieces:
OpenThree: -O--XXX---
ClosedThree: --OXXX---
How to sum them all up, and what score to give each scenario, is up to you and that is where the fun and experimenting begins!

How can I influence minimax algorithm to prefer immediate rewards?

I am implementing the minimax for a Stratego game (where the computer has perfect knowledge of all the pieces). However, I find that the computer will often not attack a piece that it can easily destroy. From what I understand, the minimax scores comes from the leaf nodes of a move tree (where each level is a turn and each score for the leaf node is calculated using an evaluation function for the board in that position). So if I have a depth of 3 levels, the computer can choose to attack on move 1 or attack on move 3. According to the minimax algorithm, it has the same score associated with it (the resulting board position has the same score). So how do I influence the minimax algorithm to prefer immediate rewards over eventual rewards? i.e. I would like the score to decay over time, but with the way minimax works I don't see how this is possible. Minimax always uses the leaf nodes to determine the intermediate nodes.

As mentioned by others in the comments, minimax should be able to notice if there is a danger in delaying to capture a piece automatically, and changing the evaluation function to force it to prefer earlier captures is likely to be detrimental to playing performance.
Still, if you really want to do it, I think the only way would be to start storing extra information in your game states (not only the board). You'll want to store timestamps in memory for every game state which allow you to still tell in hindsight exactly at what time (in which turn) a piece was previously captured. Using that information you could implement a decay factor in the evaluation function used in leaf nodes of the search tree.
A different solution may be to simply make sure that you search to an even depth level; 2 or 4 instead of 3. That way, your algorithm will always evaluate game states where the opponent had the last move, instead of your computer player. All evaluations will become more pessimistic, and this may encourage your agent to prefer earlier rewards in some situations.
This effect where odd search depths typically result in different evaluations from even search depths is referred to as the odd-even effect. You may be interested in looking into that more (though it's typically discussed for different reasons than what your question is about).

Fastest algorithm for locating an object in a field

What would be the best algorithm in terms of speed for locating an object in a field?
The field consists of 18 by 18 squares with side length 30.48 cm. The robot is placed in the square (0,0) and its job is to reach the light source while avoiding obstacles along the way. To locate the light source, the robot does a 360 degree turn to find the angle with the highest light reading and then travels towards the source. It can reliably detect a light source from 100 cm.
The way I'm implementing this presently is I'm storing the information about each tile in a 2x2 array. The possible values of the tiles are unexplored (default), blocked (there's an obstacle), empty (there's nothing in there). I'm thinking of using the DFS algorithm where the children are at position (i+3,j) or (i,j+3). However, considering the fact that I will be doing a rotation to locate the angle with the highest light reading at each child, I think there may be an algorithm which may be able to locate the light source faster than DFS. Also, I will only be travelling in the x and y directions since the robot will be using the grid lines on the floor to make corrections to it's x and y positions.
I would appreciate it if a fast and reliable algorithm could be suggested to accomplish this task.

This is a really broad question, and I'm not an expert so my answer is based on "first principles" thinking rather than experience in the field.
(I'm assuming that your robot has generally unobstructed line of sight and movement; i.e. it is an open area with scattered obstacles, not in a maze.)
The problem is interpreting the information that you get back from a 360 degree scan.
If the robot sees the light source, then traversing a route to the light source is either trivial, or a "simple" maze walking task.
The difficulty is when you don't see the source. It might mean that the source is not within the circle of visibility. But it could also mean that the light is behind an obstacle. And unfortunately, a simple sensor like you are describing cannot distinguish these two cases.
If your sensor system allowed you to see the obstacles, you could plot the locations of the "shadow" regions (regions behind obstacles), and use that to keep track of the places that are left to search. So your strategy would be to visit a small number of locations and do a scan at each, then methodically "tidy up" a small number of areas that were in shadow.
But since you cannot easily tell where the shadow areas are, you need an algorithm that (ultimately) searches everywhere. DFS is a general strategy that searches everywhere, but it does it by (in effect) looking in the nooks and crannies first. A better strategy is to a breadth first search, and only visit the nooks and crannies if the wide-scale scans didn't find the light source.
I would appreciate it if a fast and reliable algorithm could be suggested to accomplish this task.
I think you are going to need to develop one yourself. (Isn't this the point of the problem / task / competition?)

Although it may not look like it, this looks a more like a maze following problem than anything. I suppose this is some kind of challenge or contest situation, where there's always a path from start to target, but suppose there's not for a moment. One of the successful results for a robot navigating a beacon fully surrounded by obstacles would be a report with a description of a closed path of obstacles surrounding a signal. If there's not such a closed path, then you can find a hole in somewhere; this is why is looks like maze following.
So the basic algorithm I'd choose is to start with a spiraling-inward tranversal, sweeping out a path narrow enough so that you're sure to see a beacon if one is present. If there are no obstacles (a degenerate case), this finds the target in minimal time. (Hint: each turn reduces the number of cells your sensor can locate per step.)
Take the spiral traversal to be counter-clockwise. What you have then is related to the rule for solving mazes by keeping your right hand on the wall and following the generated path. In this case, you have the complication that, while the start of the maze is on the boundary, the end may not be. It's possible of the right-hand-touching path to fail in such a situation. Detecting this situation requires looking for "cavities" in the region swept out by adjacency to the wall.

How to create a program that solves a Triangular Peg solitare - C++ / Java programming?

Honestly, I only knew of such a game recently and I wonder how one can create a solving algorithm using the recursive search method?
There are 15 holes in total in the triangular board. Making that 14 pegs with a total of 13 moves.
I don't know where to start with this in C++ nor Java. I have studied C++ for an year before. So I'm familiar with the concepts of stacks, linked lists etc.
I just don't know how to start the code. The program firstly asks the user where they want to start (How is this done?)
Then once it solves it , a certain number of pegs more than just one will be left and the program will ask the user for a better solution (like this until the board is left to just one peg.)
I certainly cannot think of how to make the moves possible ( How do I write a code that "SHOWS" that one peg moves over a hole ,into another?)
I'd love some coding assistance here. It would really be appreciated.

Try treating the board a linked list of holes or positions. Each data field in a node would represent a hole. Each node would have a vector of links to other holes, depending on its position relative to the edge of the board.
To move a peg, iterate over the possible links.
This is just one method, there are probably better ones out there.

Take a look at my answer here: Timeout on a php Peg Puzzle solver. The 15-peg puzzle was the first program that I ever wrote (over 10 years ago), after just learning c++.
The posted solution is a re-write that I did several years later.
Since I was the answerer I can tell you that a triplet is a term that I made up for a move. There are three pegs involved in a move (a triplet). Each peg is represented by a bit. In order for a move to be legal two consecutive pegs (bits) need to be set and the other (the target location) needs to be clear.
The entire solution involves a depth first search using a 16-bit integer to represent the board. Simple bit manipulation can be used to update the board state. For example if you xor the current state with a move (triplet) it will make the move against the board. The same operation applied a second time against the board is an undo of the move.

To be successful as a programmer, you need to develop the skill of examining a problem, figuring out how it can be solved, and come up with a reasonable in-program representation that enables you to solve it.
For problems like this, I find it helpful to have the puzzle in front of me, so I can try it out by hand. Failing that, at least having a picture of the thing will help:
http://www.obsidiandesigns.com/pyramidsol.jpg
Here's another way to look at it, and possibly consider representing it in memory like this:
OXXXX
XXXX
XXX
XX
X
Each "X" is a peg. The "O" is a hole. One can change "OXX" to "XOO" by jumping the 3rd X over the middle one, into the hole. Also consider vertical and diagonal jumps.
Linked lists could make a lot of sense. You might actually want "2D" links, instead of the usual "1D" links. That is each "hole" instance could contain pointers to the two or three holes next to it.
Once you have a representation, you need a way to find valid moves. And to make valid moves.
For this "game," you may want to search the whole space of possible moves, so you'll probably want to either have each move produce a new board, or to have a way to "undo" moves.
A valid move is where one peg jumps over another peg, into an empty hole. The jumped peg is removed. You can't jump from "just anywhere" to anywhere, over any peg you wish -- the three holes have to be next to each other and in line.
The above should give you some hints that will help you get started.

Boosting my GA with Neural Networks and/or Reinforcement Learning

As I have mentioned in previous questions I am writing a maze solving application to help me learn about more theoretical CS subjects, after some trouble I've got a Genetic Algorithm working that can evolve a set of rules (handled by boolean values) in order to find a good solution through a maze.
That being said, the GA alone is okay, but I'd like to beef it up with a Neural Network, even though I have no real working knowledge of Neural Networks (no formal theoretical CS education). After doing a bit of reading on the subject I found that a Neural Network could be used to train a genome in order to improve results. Let's say I have a genome (group of genes), such as
1 0 0 1 0 1 0 1 0 1 1 1 0 0...
How could I use a Neural Network (I'm assuming MLP?) to train and improve my genome?
In addition to this as I know nothing about Neural Networks I've been looking into implementing some form of Reinforcement Learning, using my maze matrix (2 dimensional array), although I'm a bit stuck on what the following algorithm wants from me:
(from http://people.revoledu.com/kardi/tutorial/ReinforcementLearning/Q-Learning-Algorithm.htm)
1. Set parameter , and environment reward matrix R
2. Initialize matrix Q as zero matrix
3. For each episode:
* Select random initial state
* Do while not reach goal state
o Select one among all possible actions for the current state
o Using this possible action, consider to go to the next state
o Get maximum Q value of this next state based on all possible actions
o Compute
o Set the next state as the current state
End Do
End For
The big problem for me is implementing a reward matrix R and what a Q matrix exactly is, and getting the Q value. I use a multi-dimensional array for my maze and enum states for every move. How would this be used in a Q-Learning algorithm?
If someone could help out by explaining what I would need to do to implement the following, preferably in Java although C# would be nice too, possibly with some source code examples it'd be appreciated.

As noted in some comments, your question indeed involves a large set of background knowledge and topics that hardly can be eloquently covered on stackoverflow. However, what we can try here is suggest approaches to get around your problem.
First of all: what does your GA do? I see a set of binary values; what are they? I see them as either:
bad: a sequence of 'turn right' and 'turn left' instructions. Why is this bad? Because you're basically doing a random, brute-force attempt at solving your problem. You're not evolving a genotype: you're refining random guesses.
better: every gene (location in the genome) represents a feature that will be expressed in the phenotype. There should not be a 1-to-1 mapping between genome and phenotype!
Let me give you an example: in our brain there are 10^13ish neurons. But we have only around 10^9 genes (yes, it's not an exact value, bear with me for a second). What does this tell us? That our genotype does not encode every neuron. Our genome encodes the proteins that will then go and make the components of our body.
Hence, evolution works on the genotype directly by selecting features of the phenotype. If I were to have 6 fingers on each hand and if that would made me a better programmer, making me have more kids because I'm more successful in life, well, my genotype would then be selected by evolution because it contains the capability to give me a more fit body (yes, there is a pun there, given the average geekiness-to-reproducibily ratio of most people around here).
Now, think about your GA: what is that you are trying to accomplish? Are you sure that evolving rules would help? In other words -- how would you perform in a maze? What is the most successful thing that can help you: having a different body, or having a memory of the right path to get out? Perhaps you might want to reconsider your genotype and have it encode memorization abilities. Maybe encode in the genotype how much data can be stored, and how fast can your agents access it -- then measure fitness in terms of how fast they get out of the maze.
Another (weaker) approach could be to encode the rules that your agent uses to decide where to go. The take-home message is, encode features that, once expressed, can be selected by fitness.
Now, to the neural network issue. One thing to remember is that NNs are filters. They receive an input. perform operations on it, and return an output. What is this output? Maybe you just need to discriminate a true/false condition; for example, once you feed a maze map to a NN, it can tell you if you can get out from the maze or not. How would you do such a thing? You will need to encode the data properly.
This is the key point about NNs: your input data must be encoded properly. Usually people normalize it, maybe scale it, perhaps you can apply a sigma function to it to avoid values that are too large or too small; those are details that deal with error measures and performance. What you need to understand now is what a NN is, and what you cannot use it for.
To your problem now. You mentioned you want to use NNs as well: what about,
using a neural network to guide the agent, and
using a genetic algorithm to evolve the neural network parameters?
Rephrased like so:
let's suppose you have a robot: your NN is controlling the left and right wheel, and as input it receives the distance of the next wall and how much it has traveled so far (it's just an example)
you start by generating a random genotype
make the genotype into a phenotype: the first gene is the network sensitivity; the second gene encodes the learning ratio; the third gene.. so on and so forth
now that you have a neural network, run the simulation
see how it performs
generate a second random genotype, evolve second NN
see how this second individual performs
get the best individual, then either mutate its genotype or recombinate it with the loser
repeat
there is an excellent reading on the matter here: Inman Harvey Microbial GA.
I hope I did you some insight on such issues. NNs and GA are no silver bullet to solve all problems. In some they can do very much, in others they are just the wrong tool. It's (still!) up to us to get the best one, and to do so we must understand them well.
Have fun in it! It's great to know such things, makes everyday life a bit more entertaining :)

There is probably no 'maze gene' to find,
genetic algorithms are trying to setup a vector of properties and a 'filtering system' to decide by some kind of 'surival of the fittest' algorithm to find out which set of properties would do the best job.
The easiest way to find a way out of a maze is to move always left (or right) along a wall.
The Q-Algorithm seems to have a problem with local maxima this was workaround as I remember by kicking (adding random values to the matrix) if the results didn't improve.
EDIT: As mentioned above a backtracking algorithm suits this task better than GA or NN.
How to combine both algorithm is described here NeuroGen descibes how GA is used for training a NN.

Try using the free open source NerounDotNet C# library for your Neural networks instead of implementing it.
For Reinforcement Learning library, I am currently looking for one, especially for Dot NET framework..

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.