SOLVED: I'm sorry. I was reconstructing improperly the path. I thought closedSet had all the waypoints from start to end only, but it has some other waypoints too. I miss understood the concept. Now it's working okey!
I'm still getting some trouble with A*.
My character is finding his path, but sometimes, depending where i click on the map, the algorithm finds the shortest path or the path, but with many nodes that shouldn't be selected.
I've tried to follow Wikipedia's and A* Pathfinding for Beginner's implementation, but they give me the same result. I don't know if it is the heuristic or the algorithm itself, but something's not right.
And this is an example of the problem clicking two different nodes: http://i.imgur.com/gtgxi.jpg
Here's the Pathfind class:
import java.util.ArrayList;
import java.util.Collections;
import java.util.TreeSet;
public class Pathfind {
public Pathfind(){
}
public ArrayList<Node> findPath(Node start, Node end, ArrayList<Node> nodes){
ArrayList<Node> openSet = new ArrayList<Node>();
ArrayList<Node> closedSet = new ArrayList<Node>();
Node current;
openSet.add(start);
while(openSet.size() > 0){
current = openSet.get(0);
current.setH_cost(ManhattanDistance(current, end));
if(start == end) return null;
else if(closedSet.contains(end)){
System.out.println("Path found!");
return closedSet;
}
openSet.remove(current);
closedSet.add(current);
for(Node n : current.getNeigbours()){
if(!closedSet.contains(n)){
if(!openSet.contains(n) || (n.getG_cost() < (current.getG_cost()+10))){
n.setParent(current);
n.setG_cost(current.getG_cost()+10);
n.setH_cost(ManhattanDistance(n, end));
if(!openSet.contains(n))
openSet.add(n);
Collections.sort(openSet);
}
}
}
}
return null;
}
private int ManhattanDistance(Node start, Node end){
int cost = start.getPenalty();
int fromX = start.x, fromY = start.y;
int toX = end.x, toY = end.y;
return cost * (Math.abs(fromX - toX) + Math.abs(fromY - toY));
}
}
I believe the bug is with the condition:
if(n.getCost() < current.getCost()){
You shouldn't prevent advancing if the cost (g(node)+h(node)) is decreasing from the current. Have a look at this counter example: (S is the source and T is the target)
_________
|S |x1|x2|
----------
|x3|x4|x5|
---------
|x6|x7|T |
----------
Now, Assume you are at S, you haven't moved yet so g(S) =0, and under the manhattan distance heuristic, h(S) = 4, so you get f(S)=4
Now, have a look at x1,x3: Assuming you are taking one step to each, they will have g(x1)=g(x3)=1, and both will have h(x1)=h(x3)=3 under the same heuristic. It will result in f(x1)=f(x3)=4 - and your if condition will cause none to "open", thus once you finish iterating on S - you will not push anything to open - and your search will terminate.
As a side note:
I believe the choice of closedSet as ArrayList is not efficient. each contains() op is O(n) (where n is the number of closed nodes). You should use a Set for better performance - A HashSet is a wise choice, and if you want to maintain the order of insertion - you should use a LinkedHashSet. (Note you will have to override equals() and hashCode() methods of Node)
Do your units walk up/down/left/right only, or can they take diagonals as well?
The one requirement for the A*-heuristic is that it's admissible - it must never over-estimate the actual path-length. If your units can walk diagonally, then manhatten-distance will over-estimate the path-length, and thus A* is not guaranteed to work.
Related
I have a problem requiring me to implement an algorithm finding path from a character to another character with obstacles along the way.
I know there are a lot of advanced pathfinding algorithms(A*, BFS, DFS,dijkstra....). However, I am struggling to implement all these concepts in my code after plenty of research and attempts, and also I don't think I am required to implement all these advanced algorithms.
"Shortest" path is not the requirement, and all I need is a path that can lead my character to another character by avoiding moving onto obstacles.
Can anyone give me an idea (maybe some algorithms better than backtracking)or useful website (similar examples) for this problem?
Any help would be much appreciated
I can recommend you the A* algortihm.
Its easy to implements the algorithm. For my A* i used the wikipedia code and the geekforgeek code.
I'll post my code aswell its in C# but very similiar to java:
public List<ANote> findPath(ANote start, ANote end)
{
if (start == null || end == null || start == end || !start.walkable || !end.walkable)
return null;
List<ANote> openSet = new List<ANote>();
List<ANote> closedSet = new List<ANote>();
start.parent = null;
openSet.Add(start);
start.h = getDistance(start, end);
while (openSet.Any())
{
openSet = openSet.OrderBy(o => o.f).ToList();
ANote current = openSet[0];
if (current == end)
break;
openSet.Remove(current);
closedSet.Add(current);
foreach (ANote neighbor in current.adjacted)
{
if (closedSet.Contains(neighbor))
continue;
double _gScore = current.g + 1; // For me every distance was 1
if (openSet.Contains(neighbor) && _gScore >= neighbor.g)
continue;
if (!openSet.Contains(neighbor))
openSet.Add(neighbor);
neighbor.parent = current;
neighbor.g = _gScore;
neighbor.h = getDistance(neighbor, end);
}
}
return reconstructPath(start, end);
}
private List<ANote> reconstructPath(ANote start, ANote end)
{
List<ANote> backNotes = new List<ANote>();
ANote current = end;
while (current.parent != null)
{
backNotes.Add(current);
current = current.parent;
}
return backNotes;
}
public class ANote
{
public ANote parent { get; set; }
public double f { get { return g + h; } }
public double g { get; set; }
public double h { get; set; }
public int x { get; set; }
public int y { get; set; }
public bool walkable { get; set; }
public List<ANote> adjacted { get; set; } = new List<ANote>();
public ANote(int x, int y)
{
this.x = x;
this.y = y;
walkable = true;
}
}
Important for this code is that you must define the adjacted nodes and which are walkable and what are not before you search.
I hope my code can help you to implement A* in your code.
Since I have no idea how your "Grid" is described, and what "Cell" is, I assume the grid is rectangular, the cells only map the objects on it, not the empty spaces.
I suggest you make an char[][] array = new char[rows][columns]; (or char) initialize it with some value and iterate over Cells, fill the 2D array in some meaningful manner, the example is G for goal, # for obstacle, etc. Then you start a DFS from Goal to look for Start.
You need to store the correct path somewhere, so you need a ArrayList list; variable too. Since it's a list, you can add items to it with list.add(item);, which is handy.
Non-optimal path: DFS
DFS is a really basic recursive algorithm, in your case it will go like this:
bool DFS(list, array, row, column):
if(array[row][column] == '#' or
array[row][column] == 'v') return False; # Obstacle or visited, ignore it
if( ... == 'S') return True; # It's the Start, path found
array[row][column] = 'v'; # mark as visited, nothing interesting.
# If you want the shortest path, you're supposed to put here the distance from goal,
#that you would pass on and increment as additional argument of DFS
#Check for edge cases, or initialize array to be bigger and place obstacles on edge#
if( DFS(list, array, row-1, column) ){ # If DFS fount a path to Start from cell above
list.add("Move Up"); # Then you add a direction to go below to the list
return True; # And then tell the previous DFS, that the path was found
}
if()
if()
if()
# And then you add the checks for the other directions in a similar manner
return False; # You didn't find anything anywhere
}
That is not the code, but it should be enough for you to do your assignement from there.
Chances are it'll find a path like this:
...→→→→↓
...↑.↓←↓
...S.F↑↓
......↑↓
......↑←
But in grids with a lot of obstacles or only one correct path it'll make more reasonable paths. Also you can improve it by selecting the order you try directions in so it always tries to go towards Goal first, but that's a pain.
Optimal path: augmented DFS
To find the shortest path usually people refer to A*, but I read up on it and it's not as I remember it and it's just unnecessarily complicated, so I'll explain an expanded DFS. It takes a little longer to find the answer, than A* or BFS would, but for reasonably-sized grids it's not noticeable.
The idea of the algorithm is to map the entire grid with distances to Goal and then walk from start to goal following decreasing distance.
First you will need to use int[][] array instead of char of the previous case. That is because you need to store distances, which char can too to some extent, but also non-distance markers in the grid, like obstacles and such.
The idea of the algorithm is you call DFS(array, row, col, distance), where distance is calculated as distance of the Cell that calls DFS incremented by 1. Then DFS in the next cell checks if the distance it was passed is smaller than its current distance, if it is so, then there was a shorter path found to that cell and you need to recalculate all its neighbors too. Otherwise the new path is longer and you can disregard it. By calling DFS recursively you will gradually map the entire maze.
After that you will call another function FindPath(list, array, row, col), that will check the Cell it started in and add a direction to the cell with neighbor.distance == (this.distance - 1) to the list and then call FindPath on that neighbor until distance is 0, at which point it's the goal.
It should look smth like this:
main()
{
# initialize grid with Integer.MAX_VALUE or just a big enough number
# for Cell in Cells -> put obstacles on Grid as -1,
# find the Start and Goal and record them somewhere
# DFS_plus(array, Goal, 0);
# FindPath(list, array, Start);
# Done
}
void DFS_plus(array, row, column, distance):
if(array[row][col] <= distance) return; # There already exists a shorter path there
# or it's an obstacle, we store obstacles as -1.
# It's smaller than any possible path and thus blocks further search
array[row][column] = distance; # update distance.
# If this happened its neighbors will need to be updated too.
#Check for edge cases, or initialize array to be bigger and place obstacles on edge#
DFS_plus(array, row-1, column, distance+1); # You just map everything, no returns expected
DFS_plus(); # For all 4 directions
DFS_plus();
DFS_plus();
}
FindPath(list, array, row, col){
if(array[row][col] == 0) return; # It's the Goal
if(array[row-1][col] == (array[row][col] - 1)){ # Check if Cell above is 1 closer to Goal
list.add("MoveUp"); # Add direction
FindPath(list, array, row-1, col); # Look for next direction
return; # You don't need to check other directions as path is guaranteed
}
if(){}; # Check other directions if Up wasn't the one
if(){};
if(){};
}
It is not much more complicated, but it gets you the shortest path. It's not the quickest way to find the shortest path, but it's relatively simple as any recursive algoithm.
I try to write a MinMax program in Java for connect-four game, but this program should also be applicable to other games. But, I encountered a problem, which I cannot pass for few days. The values for nodes are not set properly. I am sharing my piece of code which is responsible for generating a tree.
Maybe you will notice where I made a mistake.
If anyone could help me with this, I will be very happy.
public Node generateTree(Board board, int depth) {
Node rootNode = new Node(board);
generateSubtree(rootNode, depth);
minMax(rootNode, depth);
return rootNode;
}
private void generateSubtree(Node subRootNode, int depth) {
Board board = subRootNode.getBoard();
if (depth == 0) {
subRootNode.setValue(board.evaluateBoard());
return;
}
for (Move move : board.generateMoves()) {
Board tempBoard = board.makeMove(move);
Node tempNode = new Node(tempBoard);
subRootNode.addChild(tempNode);
generateSubtree(tempNode, depth - 1);
}
}
public void minMax(Node rootNode, int depth) {
maxMove(rootNode, depth);
}
public int maxMove(Node node, int depth) {
if (depth == 0) {
return node.getValue();
}
int bestValue = Integer.MIN_VALUE;
for (Node childNode : node.getChildren()) {
int tempValue = minMove(childNode, depth - 1);
childNode.setValue(tempValue);
if (tempValue > bestValue) {
bestValue = tempValue;
}
}
return bestValue;
}
public int minMove(Node node, int depth) {
if (depth == 0) {
return node.getValue();
}
int bestValue = Integer.MAX_VALUE;
for (Node childNode : node.getChildren()) {
int tempValue = maxMove(childNode, depth - 1);
childNode.setValue(tempValue);
if (tempValue < bestValue) {
bestValue = tempValue;
}
}
return bestValue;
}
Board class is the representation of the board state.
Move class hold the move to perform (integer [0-8] for tic-tac-toe, [0-6] for Connect Four).
Node class holds the Move and value how good given move is. Also, holds all its children.
In the code I use this method like this:
Node newNode = minmax.generateTree(board, depth, board.getPlayer());
Move newMove = new TicTacToeMove(board.getPlayer(), newNode.getBestMove().getMove(), depth);
board = board.makeMove(newMove);
And when it's obvious that given move is a losing move (or winning), I do not receive this move.
Alright, you did make a couple of mistakes. About 3-4, depending on how you count ;) Took me a bit of debugging to figure it all out, but I finally got an answer for you :D
Mistake #1: All your parents always get twins (that poor mother)
This is only the case with the code you uploaded, not the code in your question, so maybe we count it as half a mistake?
Since your trees aren't that big yet and it won't destroy your algorithm, this was the least important one anyway. Still, it's something to watch out for.
In your uploaded code, you do this in your generateSubtree method:
Node tempNode = new Node(tempBoard, move, subRootNode);
subRootNode.addChild(tempNode);
As that constructor already adds the child to the subRootNode, the second line always adds it a second time.
Mistake #2: That darn depth
If you haven't reached your desired depth yet, but the game is already decided, you completely ignore that. So in your provided example that won't work, if - for example - you look at making move 7 instead of 3 (which would be the 'right' move) and then the opponent does move 3, you don't count it as -10 points because you haven't reached your depth yet. It still won't get any children, so even in your minmax, it will never realize it's a screwed up way to go.
Which is why every move is 'possible' in this scenario and you just get the first one returned.
In the previous moves, there was luckily always a way to reach a losing move with your opponents third move (aka move #5), which is why those were called correctly.
Alright, so how do we fix it?
private void generateSubtree(Node subRootNode, int depth, int player) {
Board board = subRootNode.getBoard();
List<Move> moveList = board.generateMoves();
if (depth == 0 || moveList.isEmpty()) {
subRootNode.setValue(board.evaluateBoard(player));
return;
}
for (Move move : moveList) {
Board tempBoard = board.makeMove(move);
Node tempNode = new Node(tempBoard, move, subRootNode);
generateSubtree(tempNode, depth - 1, player);
}
}
Just get the move list beforehand and then look if it's empty (your generateMoves() method of the Board class (thank god you provided that by the way ;)) already checks if the game is over, so if it is, there won't be any moves generated. Perfect time to check the score).
Mistake #3: That darn depth again
Didn't we just go over this?
Sadly, your Min Max algorithm itself has the same problem. It will only even look at your values if you have reached the desired depth. You need to change that.
However, this is a bit more complicated, since you don't have a nice little method that already checks if the game is finished for you.
You could check to see if your value was set, but here's the problem: It might be set to 0 and you need to take that into account as well (so you can't just do if (node.getValue() != 0)).
I just set the initial value of each node to -1 instead and did a check against -1. It's not... you know... pretty. But it works.
public class Node {
private Board board;
private Move move;
private Node parent;
private List<Node> children = new ArrayList<Node>();;
private boolean isRootNode = false;
private int value = -1;
...
And this in the maxMove:
public int maxMove(Node node, int depth) {
if (depth == 0 || node.getValue() != -1) {
return node.getValue();
}
int bestValue = Integer.MIN_VALUE;
for (Node childNode : node.getChildren()) {
int tempValue = minMove(childNode, depth - 1);
childNode.setValue(tempValue);
if (tempValue > bestValue) {
bestValue = tempValue;
}
}
return bestValue;
}
It works the same for minMove of course.
Mistake #4: The player is screwing with you
Once I changed all that, it took me a moment with the debugger to realize why it still wouldn't work.
This last mistake was not in the code you provided in the question btw. Shame on you! ;)
Turns out it was this wonderful piece of code in your TicTacToeBoard class:
#Override
public int getPlayer() {
// TODO Auto-generated method stub
return 0;
}
And since you called
MinMax minmax = new MinMax();
Node newNode = minmax.generateTree(board, (Integer) spinner.getValue(), board.getPlayer());
in your makeMove method of TicTacToeMainWindow, you would always start out with the wrong player.
As you can probably guess yourself, you just need to change it to:
public int getPlayer() {
return this.player;
}
And it should do the trick.
Also:
Just a couple of things I'd like to remark at this point:
Clean up your imports! Your TicTacToe actually still imports your ConnectFour classes! And for no reason.
Your board is rotated and mirrored in your board array. WHY? You know how annoying that is to debug? I mean, I guess you probably do :D Also, if you're having problems with your code and you need to debug it's extremely helpful to overwrite your boards toString() method, because that will give you a very nice and easy way to look at your board in the debugger. You can even use it to rotate it again, so you see don't have to look at it lying on the side ;)
While we're at the subject of the board... this is just me but... I always tried clicking on the painted surface first and then had to remember: Oh yeah, there were buttons :D I mean... why not just put the images on the buttons or implement a MouseListener so you can actually just click on the painted surface?
When providing code and/or example images, please take out your test outputs. I'm talking about the Player 1 won!s of course ;)
Please learn what a complete, verifiable and minimal example is for the next time you ask a question on StackOverflow. The one in your question wasn't complete or verifiable and the one you provided on github was... well... not complete (the images were missing), but complete enough. It was also verifiable, but it was NOT minimal. You will get answers a LOT sooner if you follow the guidelines.
I'm looking into depth first search and the examples I found are looking for a particular answer, lets say the number 10.
It goes through the tree discarding every node that isn't 10 and stop when it finds 10.
Is it possible to use depth first search or another algorithm to search every branch of the tree? I would like it to run a scenario and come up with a value and store that into a variable possibly named highestValue.
It would then search the next branch and get a value and store that into a variable possibly named Value. Then it would compare highestValue to Value and if (Value > highestValue) highestValue = Value.
It would repeat the process until it is finished running every possible scenario. Any ideas? I should mention I'm writing this in Java.
DFS is easiest if we want to visit every node in the graph. However, if we have a very large tree and want to be prepared to quit when we get too far from the original node, DFS can search thousands of ancestors of the node but never search all of the nodes children.
Strictly speaking, it depends on how the data in your graph is organize. Source
Since you're still wondering how it might work, this piece of code might help you figure that out. This works for graphs, take a look. It DFSes every node, but stops when it reaches the node we want to find.
To get the highest value, just store the max value into an int variable, and continue searching and comparing each node's data to the current max inside the int variable.
public static boolean search(Graph g, Node start, Node end) {
LinkedList<Node> stack = new LinkedList<Node>();
for (Node u : g.getNodes()) {
u.state = State.Unvisited;
}
start.state = State.Visiting;
stack.add(start);
Node u;
while (!stack.isEmpty()) {
u = stack.removeFirst();
if (u != null) {
for ( Node v : u.getAdjacent() ) {
if (v.state == State.Unvisited) {
if (v == end) {
return true;
}
else {
v.state = State.Visiting;
stack.add(v);
}
}
}
u.state = State.Visited;
}
}
return false;
}
I have been implementing an LLRB package that should be able to operate in either of the two modes, Bottom-Up 2-3 or Top-Down 2-3-4 described by Sedgewick (code - improved code, though dealing only with 2-3 trees here, thanks to RS for pointer).
Sedgewick provides a very clear description of tree operations for the 2-3 mode, although he spends a lot of time talking about the 2-3-4 mode. He also shows how a simple alteration of the order of color flipping during insertion can alter the behaviour of the tree (either split on the way down for 2-3-4 or split on the way up for 2-3):
private Node insert(Node h, Key key, Value value)
{
if (h == null)
return new Node(key, value);
// Include this for 2-3-4 trees
if (isRed(h.left) && isRed(h.right)) colorFlip(h);
int cmp = key.compareTo(h.key);
if (cmp == 0) h.val = value;
else if (cmp < 0) h.left = insert(h.left, key, value);
else h.right = insert(h.right, key, value);
if (isRed(h.right) && !isRed(h.left)) h = rotateLeft(h);
if (isRed(h.left) && isRed(h.left.left)) h = rotateRight(h);
// Include this for 2-3 trees
if (isRed(h.left) && isRed(h.right)) colorFlip(h);
return h;
}
However, he glosses over deletion in 2-3-4 LLRBs with the following:
The code on the next page is a full implementation of delete() for LLRB 2-3 trees. It is based on the reverse of the approach used for insert in top-down 2-3-4 trees: we perform rotations and color flips on the way down the search path to ensure that the search does not end on a 2-node, so that we can just delete the node at the bottom. We use the method fixUp() to share the code for the color flip and rotations following the recursive calls in the insert() code. With fixUp(), we can leave right-leaning red links and unbalanced 4-nodes along the search path, secure that these conditions will be fixed on the way up the tree. (The approach is also effective 2-3-4 trees, but requires an extra rotation when the right node off the search path is a 4-node.)
His delete() function:
private Node delete(Node h, Key key)
{
if (key.compareTo(h.key) < 0)
{
if (!isRed(h.left) && !isRed(h.left.left))
h = moveRedLeft(h);
h.left = delete(h.left, key);
}
else
{
if (isRed(h.left))
h = rotateRight(h);
if (key.compareTo(h.key) == 0 && (h.right == null))
return null;
if (!isRed(h.right) && !isRed(h.right.left))
h = moveRedRight(h);
if (key.compareTo(h.key) == 0)
{
h.val = get(h.right, min(h.right).key);
h.key = min(h.right).key;
h.right = deleteMin(h.right);
}
else h.right = delete(h.right, key);
}
return fixUp(h);
}
My implementation correctly maintains LLRB 2-3 invariants for all tree operations on 2-3 trees, but fails for a subclass of right-sided deletions on 2-3-4 trees (these failing deletions result in right leaning red nodes, but snowball to tree imbalance and finally null pointer dereferencing). From a survey of example code that discusses LLRB trees and includes options for construction of trees in either mode, it seems that none correctly implements the deletion from a 2-3-4 LLRB (i.e. none has the extra rotation alluded to, e.g. Sedgewick's java above and here).
I'm having trouble figuring out exactly what he means by "extra rotation when the right node off the search path is a 4-node"; presumably this is a rotate left, but where and when?
If I rotate left passing upwards past a 4-node equivalent (i.e. RR node) or a right leaning 3-node equivalent (BR node) either before calling fixUp() or at the end of the fixUp function I still get the same invariant contradiction.
Here are the tree states of the smallest failing examples I have found (generated by sequential insertion of elements from 0 to the respective maximum value).
The first pair of trees shows the transition from invariant-conforming state prior to deletion of element 15 to obviously broken state after.
The second is essentially the same as above, but with deletion of 16 of 0..16 (deletion of 15 results in the same topology). Note that the invariant contradiction manages to cross the root node.
The key is going to be understanding how to revert the violations generated during the walk down the tree to the target node. The following two trees show how the first tree above looks after a walk down the left and right respectively (without deletion and before walking back up with fixUp()).
After attempt to delete '-1' without fixUp:
After attempt to delete '16' without fixUp:
Trying rotate left on the walk back up when the node has only a red right child seems to be part of the solution, but it does not deal correctly with two red right children in a row, preceding this with a flipColor when both children are red seems to improve the situation further, but still leaves some invariants violated.
If I further check whether the right child of a right child is red when its sibling is black and rotate left if this is true I only fail once, but at this point I feel like I'm needing a new theory rather than a new epicycle.
Any ideas?
For reference, my implementation is available here (No, it's not Java).
Follow-up:
Part of the reason I was interested in this was to confirm the claim by many that 2-3 LLRB trees are more efficient than 2-3-4 LLRB trees. My benchmarking has confirmed this for insertion and deletion (2-3 are about 9% faster), but I find that retrieval is very slightly faster for 2-3-4 trees.
The following times are representative and consistent across runs:
BU23:
BenchmarkInsert 1000000 1546 ns/op
BenchmarkDelete 1000000 1974 ns/op
BenchmarkGet 5000000 770 ns/op
TD234:
BenchmarkInsert 1000000 1689 ns/op
BenchmarkDelete 1000000 2133 ns/op
BenchmarkGet 5000000 753 ns/op
First column is bench name, second is number of operations, third is result. Benchmark on i5M 2.27.
I have had a look at branch lengths for 2-3 tree and 2-3-4 trees and there is little in that to explain the retrieval difference (mean distance from root to node and S.D. of 1000 trees each with 10000 random inserts):
Means:
TD234 leafs BU23 leafs
12.88940 12.84681
TD234 all BU23 all
11.79274 11.79163
StdDev:
TD234 leafs BU23 leafs
1.222458 1.257344
TD234 all BU23 all
1.874335 1.885204
Updated and verified
Of key importance to testing this is that the implementation doesn't support deleting a nonexistent or previously deleted node! I spent way too long trying to figure out why my working solution was "broken". This can be fixed by doing a preliminary search for the key and returning false if it's not in the tree at all, and that solution was employed in the linked code at the bottom.
It doesn't appear Sedgewick wrote a deletion for 2-3-4 deletion that is publicly available. His results specifically deal with 2-3 trees (he only makes cursory mention of 2-3-4 trees in that their average path length (and thus search cost), as well as that of other red-black trees, is indistinguishable from the 2-3 case). Nobody else seems to have one easily found, either, so here's what I found after debugging the problem:
To begin, take Sedgewick's code and fix the out of date bits. In the slides here (pg 31) you can see that his code still uses the old representation of 4 nodes where it was done by having two left reds in a row, rather than balance. The first bit to write a 2-3-4 deletion routine, then, is to fix this so that we can do a sanity check which will help us verify our fixes later:
private boolean is234(Node x)
{
if (x == null)
return true;
// Note the TD234 check is here because we also want this method to verify 2-3 trees
if (isRed(x.right))
return species == TD234 && isRed(x.left);
if (!isRed(x.right))
return true;
return is234(x.left) && is234(x.right);
}
Once we have this, we know a couple things. One, from the paper we see that 4 nodes should not be broken on the way up when using a 2-3-4 tree. Two, there's a special case for a right 4-node on the search path. There's a third special case that isn't mentioned, and that is when you are going back up a tree, you may end up where you have h.right.left be red, which would leave you invalid with just a rotate left. This is the mirror of the case described for insert on page 4 of the paper.
The rotation fix for a 4-node you need is as follows:
private Node moveRedLeft(Node h)
{ // Assuming that h is red and both h.left and h.left.left
// are black, make h.left or one of its children red.
colorFlip(h);
if (isRed(h.right.left))
{
h.right = rotateRight(h.right);
h = rotateLeft(h);
colorFlip(h);
if (isRed(h.right.right) )
h.right = rotateLeft(h.right);
}
return h;
}
And this removes the splitting on 2-3-4, as well as adds the fix for the third special case
private Node fixUp(Node h)
{
if (isRed(h.right))
{
if (species == TD234 && isRed(h.right.left))
h.right = rotateRight(h.right);
h = rotateLeft(h);
}
if (isRed(h.left) && isRed(h.left.left))
h = rotateRight(h);
if (species == BU23 && isRed(h.left) && isRed(h.right))
colorFlip(h);
return setN(h);
}
Finally, we need to test this and make sure it works. They don't have to be the most efficient, but as I found during the debugging of this, they have to actually work with the expected tree behavior (i.e. not insert/delete duplicate data)! I did this with a test helper methods. The commented lines were there for when I was debugging, I'd break and check the tree for obvious imbalance. I've tried this method with 100000 nodes, and it performed flawlessly:
public static boolean Test()
{
return Test(System.nanoTime());
}
public static boolean Test(long seed)
{
StdOut.println("Seeding test with: " + seed);
Random r = new Random(seed);
RedBlackBST<Integer, Integer> llrb = new RedBlackBST<Integer,Integer>(TD234);
ArrayList<Integer> treeValues = new ArrayList<Integer>();
for (int i = 0; i < 1000; i++)
{
int val = r.nextInt();
if (!treeValues.contains(val))
{
treeValues.add(val);
llrb.put(val, val);
}
else
i--;
}
for (int i = 0; i < treeValues.size(); i++)
{
llrb.delete(treeValues.get(i));
if (!llrb.check())
{
return false;
}
// StdDraw.clear(Color.GRAY);
// llrb.draw(.95, .0025, .008);
}
return true;
}
The complete source can be found here.
My Huffman tree which I had asked about earlier has another problem! Here is the code:
package huffman;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.util.ArrayList;
import java.util.PriorityQueue;
import java.util.Scanner;
public class Huffman {
public ArrayList<Frequency> fileReader(String file)
{
ArrayList<Frequency> al = new ArrayList<Frequency>();
Scanner s;
try {
s = new Scanner(new FileReader(file)).useDelimiter("");
while (s.hasNext())
{
boolean found = false;
int i = 0;
String temp = s.next();
while(!found)
{
if(al.size() == i && !found)
{
found = true;
al.add(new Frequency(temp, 1));
}
else if(temp.equals(al.get(i).getString()))
{
int tempNum = al.get(i).getFreq() + 1;
al.get(i).setFreq(tempNum);
found = true;
}
i++;
}
}
} catch (FileNotFoundException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
return al;
}
public Frequency buildTree(ArrayList<Frequency> al)
{
Frequency r = al.get(1);
PriorityQueue<Frequency> pq = new PriorityQueue<Frequency>();
for(int i = 0; i < al.size(); i++)
{
pq.add(al.get(i));
}
/*while(pq.size() > 0)
{
System.out.println(pq.remove().getString());
}*/
for(int i = 0; i < al.size() - 1; i++)
{
Frequency p = pq.remove();
Frequency q = pq.remove();
int temp = p.getFreq() + q.getFreq();
r = new Frequency(null, temp);
r.left = p;
r.right = q;
pq.add(r); // put in the correct place in the priority queue
}
pq.remove(); // leave the priority queue empty
return(r); // this is the root of the tree built
}
public void inOrder(Frequency top)
{
if(top == null)
{
return;
}
else
{
inOrder(top.left);
System.out.print(top.getString() +", ");
inOrder(top.right);
return;
}
}
public void printFreq(ArrayList<Frequency> al)
{
for(int i = 0; i < al.size(); i++)
{
System.out.println(al.get(i).getString() + "; " + al.get(i).getFreq());
}
}
}
What needs to be done now is I need to create a method that will search through the tree to find the binary code (011001 etc) to the specific character. What is the best way to do this? I thought maybe I would do a normal search through the tree as if it were an AVL tree going to the right if its bigger or left if it's smaller.
But because the nodes don't use ints doubles etc. but only using objects that contain characters as strings or null to signify its not a leaf but only a root. The other option would be to do an in-order run through to find the leaf that I'm looking for but at the same time how would I determine if I went right so many times or left so many times to get the character.
package huffman;
public class Frequency implements Comparable {
private String s;
private int n;
public Frequency left;
public Frequency right;
Frequency(String s, int n)
{
this.s = s;
this.n = n;
}
public String getString()
{
return s;
}
public int getFreq()
{
return n;
}
public void setFreq(int n)
{
this.n = n;
}
#Override
public int compareTo(Object arg0) {
Frequency other = (Frequency)arg0;
return n < other.n ? -1 : (n == other.n ? 0 : 1);
}
}
What I'm trying to do is find the binary code to actually get to each character. So if I were trying to encode aabbbcccc how would I create a string holding the binary code for a going left is 0 and going right is 1.
What has me confused is because you can't determine where anything is because the tree is obviously unbalanced and there is no determining if a character is right or left of where you are. So you have to search through the whole tree but if you get to a node that isn't what you are looking for, you have backtrack to another root to get to the other leaves.
Traverse through the huffman tree nodes to get a map like {'a': "1001", 'b': "10001"} etc. You can use this map to get the binary code to a specific character.
If you need to do in reverse, just handle it as a state machine:
state = huffman_root
for each bit
if (state.type == 'leaf')
output(state.data);
state = huffman_root
state = state.leaves[bit]
Honestly said, I didn't look into your code. It ought be pretty obvious what to do with the fancy tree.
Remember, if you have 1001, you will never have a 10010 or 10011. So your basic method looks like this (in pseudocode):
if(input == thisNode.key) return thisNode.value
if(input.endsWith(1)) return search(thisNode.left)
else return search(thisNode.right)
I didn't read your program to figure out how to integrate it, but that's a key element of huffman encoding in a nutshell
Try something like this - you're trying to find token. So if you wanted to find the String for "10010", you'd do search(root,"10010")
String search(Frequency top, String token) {
return search(top,token,0);
}
// depending on your tree, you may have to switch top.left and top.right
String search(Frequency top, String token, int depth) {
if(token.length() == depth) return "NOT FOUND";
if(token.length() == depth - 1) return top.getString();
if(token.charAt(depth) == '0') return search(top.left,token,depth+1);
else return search(top.right,token,depth+1);
}
I considered two options when I was having a go at Huffman coding encoding tree.
option 1: use pointer based binary tree. I coded most of this and then felt that, to trace up the tree from the leaf to find an encoding, I needed parent pointers. other wise, like mentioned in this post, you do a search of the tree which is not a solution to finding the encoding straight away. The disadvantage of the pointer based tree is that, I have to have 3 pointers for every node in the tree which I thought was too much. The code to follow the pointers is simple but more complicated that in option 2.
option 2: use an array based tree to represent the encoding tree that you will use on the run to encode and decode. so if you want the encoding of a character, you find the character in the array. Pretty straight forward, I use a table so smack right and there I get the leaf. now I trace up to the root which is at index 1 in the array. I do a (current_index / 2) for the parent. if child index is parent /2 it is a left and otherwise right.
option 2 was pretty easy to code up and although the array can have a empty spaces. I thought it was better in performance than a pointer based tree. Besides identifying the root and leaf now is a matter of indices rather than object type. ;) This will also be very usefull if you have to send your tree!?
also, you dont search (root, 10110) while decoding the Huffman code. You just walk the tree through the stream of encoded bitstream, take a left or right based on your bit and when you reach the leaf, you output the character.
Hope this was helpful.
Harisankar Krishna Swamy (example)
I guess your homework is either done or very late by now, but maybe this will help someone else.
It's actually pretty simple. You create a tree where 0 goes right and 1 goes left. Reading the stream will navigate you through the tree. When you hit a leaf, you found a letter and start over from the beginning. Like glowcoder said, you will never have a letter on a non-leaf node. The tree also covers every possible sequence of bits. So navigating in this way always works no matter the encoded input.
I had an assignment to write an huffman encoder/decoder just like you a while ago and I wrote a blog post with the code in Java and a longer explanation : http://www.byteauthor.com/2010/09/huffman-coding-in-java/
PS. Most of the explanation is on serializing the huffman tree with the least possible number of bits but the encoding/decoding algorithms are pretty straightforward in the sources.
Here's a Scala implementation: http://blog.flotsam.nl/2011/10/huffman-coding-done-in-scala.html