Graph Tour with Uniform Cost Search in Java

Graph Tour with Uniform Cost Search in Java - java

I'm new to this site, so hopefully you guys don't mind helping a nub.
Anyway, I've been asked to write code to find the shortest cost of a graph tour on a particular graph, whose details are read in from file. The graph is shown below:
http://img339.imageshack.us/img339/8907/graphr.jpg
This is for an Artificial Intelligence class, so I'm expected to use a decent enough search method (brute force has been allowed, but not for full marks).
I've been reading, and I think that what I'm looking for is an A* search with constant heuristic value, which I believe is a uniform cost search. I'm having trouble wrapping my head around how to apply this in Java.
Basically, here's what I have:
Vertex class -
ArrayList<Edge> adjacencies;
String name;
int costToThis;
Edge class -
final Vertex target;
public final int weight;
Now at the moment, I'm struggling to work out how to apply the uniform cost notion to my desired goal path. Basically I have to start on a particular node, visit all other nodes, and end on that same node, with the lowest cost.
As I understand it, I could use a PriorityQueue to store all of my travelled paths, but I can't wrap my head around how I show the goal state as the starting node with all other nodes visited.
Here's what I have so far, which is pretty far off the mark:
public static void visitNode(Vertex vertex) {
ArrayList<Edge> firstEdges = vertex.getAdjacencies();
for(Edge e : firstEdges) {
e.target.costToThis = e.weight + vertex.costToThis;
queue.add(e.target);
}
Vertex next = queue.remove();
visitNode(next);
}
Initially this takes the starting node, then recursively visits the first node in the PriorityQueue (the path with the next lowest cost).
My problem is basically, how do I stop my program from following a path specified in the queue if that path is at the goal state? The queue currently stores Vertex objects, but in my mind this isn't going to work as I can't store whether other vertices have been visited inside a Vertex object.
Help is much appreciated!
Josh
EDIT: I should mention that paths previously visited may be visited again. In the case I provided this isn't beneficial, but there may be a case where visiting a node previously visited to get to another node would lead to a shorter path (I think). So I can't just do it based on nodes already visited (this was my first thought too)

Two comments:
1) When you set costToThis of a vertex, you override the existing value, and this affects all paths in the queue, since the vertex is shared by many paths. I would not store the costToThis as a part of Vertex. Instead, I would have defined a Path class that contains the total cost of the path plus a list of nodes composing it.
2) I am not sure if I understood correctly your problem with the goal state. However, the way I would add partial paths to the queue is as follows: if the path has a length<N-1, a return to any visited node is illegal. When length=N-1, the only option is returning to the starting node. You can add visitedSet to your Path class (as a HashSet), so that you can check efficiently whether a given node has been visited or not.
I hope this helps...

Related

Algorithm for minimal cost + maximum matching in a general graph

I've got a dataset consisting of nodes and edges.
The nodes respresent people and the edges represent their relations, which each has a cost that's been calculated using euclidean distance.
Now I wish to match these nodes together through their respective edges, where there is only one constraint:
Any node can only be matched with a single other node.
Now from this we know that I'm working in a general graph, where every node could theoretically be matched with any node in the dataset, as long as there is an edge between them.
What I wish to do, is find the solution with the maximum matches and the overall minimum cost.
Node A
Node B
Node C
Node D
- Edge 1:
Start: End Cost
Node A Node B 0.5
- Edge 2:
Start: End Cost
Node B Node C 1
- Edge 3:
Start: End Cost
Node C Node D 0.5
- Edge 2:
Start: End Cost
Node D Node A 1
The solution to this problem, would be the following:
Assign Edge 1 and Edge 3, as that is the maximum amount of matches ( in this case, there's obviously only 2 solutions, but there could be tons of branching edges to other nodes)
Edge 1 and Edge 3 is assigned, because it's the solution with maximum amount of matches and the minimum overall cost (1)
I've looked into quite a few algorithms including Hungarian, Blossom, Minimal-cost flow, but I'm uncertain which is the best for this case. Also there seems so be an awful lot of material to solving these kinds of problems in bipartial graph's, which isn't really the case in this matter.
So I ask you:
Which algorithm would be the best in this scenario to return the (a) maximum amount of matches and (b) with the lowest overall cost.
Do you know of any good material (maybe some easy-to-understand pseudocode), for your recomended algorithm? I'm not the strongest in mathematical notation.

For (a), the most suitable algorithm (there are theoretically faster ones, but they're more difficult to understand) would be Edmonds' Blossom algorithm. Unfortunately it is quite complicated, but I'll try to explain the basis as best I can.
The basic idea is to take a matching, and continually improve it (increase the number of matched nodes) by making some local changes. The key concept is an alternating path: a path from an unmatched node to another unmatched node, with the property that the edges alternate between being in the matching, and being outside it.
If you have an alternating path, then you can increase the size of the matching by one by flipping the state (whether or not they are in the matching) of the edges in the alternating path.
If there exists an alternating path, then the matching is not maximum (since the path gives you a way to increase the size of the matching) and conversely, you can show that if there is no alternating path, then the matching is maximum. So, to find a maximum matching, all you need to be able to do is find an alternating path.
In bipartite graphs, this is very easy to do (it can be done with DFS). In general graphs this is more complicated, and this is were Edmonds' Blossom algorithm comes in. Roughly speaking:
Build a new graph, where there is an edge between two vertices if you can get from u to v by first traversing an edge that is in the matching, and then traversing and edge that isn't.
In this graph, try to find a path from an unmatched vertex to a matched vertex that has an unmatched neighbor (that is, a neighbor in the original graph).
Each edge in the path you find corresponds to two edges of the original graph (namely an edge in the matching and one not in the matching), so the path translates to an alternating walk in the new graph, but this is not necessarily an alternating path (the distinction between path and walk is that a path only uses each vertex once, but a walk can use each vertex multiple times).
If the walk is a path, you have an alternating path and are done.
If not, then the walk uses some vertex more than once. You can remove the part of the walk between the two visits to this vertex, and you obtain a new graph (with part of the vertices removed). In this new graph you have to do the whole search again, and if you find an alternating path in the new graph you can "lift" it to an alternating path for the original graph.
Going into the details of this (crucial) last step would be a bit too much for a stackoverflow answer, but you can find more details on Wikipedia and perhaps having this high-level overview helps you understand the more mathematical articles.
Implementing this from scratch will be quite challenging.
For the weighted version (with the Euclidean distance), there is an even more complicated variant of Edmonds' Algorithm that can handle weights. Kolmogorov offers a C++ implementation and accompanying paper. This can also be used for the unweighted case, so using this implementation might be a good idea (even if it is not in java, there should be some way to interface with it).
Since your weights are based on Euclidean distances there might be a specialized algorithm for that case, but the more general version I mentioned above would also work and and implementation is available for it.

uniform cost search making some trouble

i am making program for uniform cost search but in my code when i put nodes in priority queue the node overrides... i don't know what's the problem.
for example if node A is already present in a queue with value 10, and if i put node A again with value 20 then the previous node A value also changes.
can any one help?
while(!queue.isEmpty())
{
Node temp=queue.remove();
System.out.println(temp.city_name+" "+temp.getPath_cost());
path.add(temp);
if(temp==destination)
{
break;
}
System.out.println(temp.link.length);
for(int i=0; i<temp.link.length; i++){
a=temp.link[i].cost;
b=temp.link[i].getParent().getPath_cost();
temp.link[i].getNode().setPath_cost(a+b);
System.out.print(temp.link[i].getNode().city_name+": ");
System.out.println(temp.link[i].getNode().getPath_cost());
queue.add(temp.link[i].getNode());
}
}

It looks like your node holds the path cost directly, so there is only ever one reference to a node. Vertexes in a graph (nodes) don't need to know about their path cost for search; instead the search typically implements a wrapper for each node evaluated at the time it is evaluated that stores search specific information, like path cost. Basically you query the node for it's neighbors (link property in your impl) and create a new instance of the wrapper for each neighbor.
All that being said, if you are only looking for the shortest path then you can safely just set the path cost to the min of it's existing path cost and the currently evaluated path cost. You would start by setting the path cost of each node positive infinity.

Find the pairings such that the sum of the weights is minimized?

When solving the Chinese postman problem (route inspection problem), how can we find the pairings (between odd vertices) such that the sum of the weights is minimized?
This is the most crucial step in the algorithm that successfully solves the Chinese Postman Problem for a non-Eulerian Graph. Though it is easy to implement on paper, but I am facing difficulty in implementing in Java.
I was thinking about ways to find all possible pairs but if one runs the first loop over all the odd vertices and the next loop for all the other possible pairs. This will only give one pair, to find all other pairs you would need another two loops and so on. This is rather strange as one will be 'looping over loops' in a crude sense. Is there a better way to resolve this problem.
I have read about the Edmonds-Jonhson algorithm, but I don't understand the motivation behind constructing a bipartite graph. And I have also read Chinese Postman Problem: finding best connections between odd-degree nodes, but the author does not explain how to implement a brute-force algorithm.
Also the following question: How should I generate the partitions / pairs for the Chinese Postman problem? has been asked previously by a user of Stack overflow., but a reply to the post gives a python implementation of the code. I am not familiar with python and I would request any community member to rewrite the code in Java or if possible explain the algorithm.
Thank You.

Economical recursion
These tuples normally are called edges, aren't they?
You need a recursion.
0. Create main stack of edge lists.
1. take all edges into a current edge list. Null the found edge stack.
2. take a next current edge for the current edge list and add it in the found edge stack.
3. Create the next edge list from the current edge list. push the current edge list into the main stack. Make next edge list current.
4. Clean current edge list from all adjacent to current edge and from current edge.
5. If the current edge list is not empty, loop to 2.
6. Remember the current state of found edge stack - it is the next result set of edges that you need.
7. Pop the the found edge stack into current edge. Pop the main stack into current edge list. If stacks are empty, return. Repeat until current edge has a next edge after it.
8. loop to 2.
As a result, you have all possible sets of edges and you never need to check "if I had seen the same set in the different order?"

It's actually fairly simple when you wrap your head around it. I'm just sharing some code here in the hope it will help the next person!
The function below returns all the valid odd vertex combinations that one then needs to check for the shortest one.
private static ObjectArrayList<ObjectArrayList<IntArrayList>> getOddVertexCombinations(IntArrayList oddVertices,
ObjectArrayList<IntArrayList> buffer){
ObjectArrayList<ObjectArrayList<IntArrayList>> toReturn = new ObjectArrayList<>();
if (oddVertices.isEmpty()) {
toReturn.add(buffer.clone());
} else {
int first = oddVertices.removeInt(0);
for (int c = 0; c < oddVertices.size(); c++) {
int second = oddVertices.removeInt(c);
buffer.add(new IntArrayList(new int[]{first, second}));
toReturn.addAll(getOddVertexCombinations(oddVertices, buffer));
buffer.pop();
oddVertices.add(c, second);
}
oddVertices.add(0, first);
}
return toReturn;
}

Why store the points in a binary tree?

This question covers a software algorithm, from On topic
I am working on an interview question from Amazon Software Question,
specifically "Given a set of points (x,y) and an integer "n", return n number of points which are close to the origin"
Here is the sample high level psuedocode answer to this question, from Sample Answer
Step 1: Design a class called point which has three fields - int x, int y, int distance
Step 2: For all the points given, find the distance between them and origin
Step 3: Store the values in a binary tree
Step 4: Heap sort
Step 5: print the first n values from the binary tree
I agree with steps 1 and 2 because it makes sense in terms of object-oriented design to have one software bundle of data, Point, encapsulate away the fields of x, y and distance.Ensapsulation
Can someone explain the design decisions from 3 to 5?
Here's how I would do steps of 3 to 5
Step 3: Store all the points in an array
Step 4: Sort the array with respect to distance(I use some build in sort here like Arrays.Sort
Step 5: With the array sorted in ascending order, I print off the first n values
Why the author of that response use a more complicated data structure, binary tree and not something simpler like an array that I used? I know what a binary tree is - hierarchical data structure of nodes with two pointers. In his algorithm, would you have to use a BST?

First, I would not say that having Point(x, y, distance) is good design or encapsulation. distance is not really part of a point, it can be computed from x and y. In term of design, I would certainly have a function, i.e. a static method from Point or an helper class Points.
double distance(Point a, Point b)
Then for the specific question, I actually agree with your solution, to put the data in an array, sort this array and then extract the N first.
What the example may be hinted at is that the heapsort actually often uses a binary tree structure inside the array to be sorted as explained here :
The heap is often placed in an array with the layout of a complete binary tree.
Of course, if the distance to the origin is not stored in the Point, for performance reason, it had to be put with the corresponding Point object in the array, or any information that will allow to get the Point object from the sorted distance (reference, index), e.g.
List<Pair<Long, Point>> distancesToOrigin = new ArrayList<>();
to be sorted with a Comparator<Pair<Long, Point>>

It is not necessary to use BST. However, it is a good practice to use BST when needing a structure that is self-sorted. I do not see the need to both use BST and heapsort it (somehow). You could use just BST and retrieve the first n points. You could also use an array, sort it and use the first n points.
If you want to sort an array of type Point, you could implement the interface Comparable (Point would imolement that interface) and overload the default method.
You never have to choose any data structures, but by determining the needs you have, you would also easily determine the optimum structure.

The approach described in this post is more complex than needed for such a question. As you noted, simple sorting by distance will suffice. However, to help explain your confusion about what your sample answer author was trying to get at, maybe consider the k nearest neighbors problem which can be solved with a k-d tree, a structure that applies space partitioning to the k-d dataset. For 2-dimensional space, that is indeed a binary tree. This tree is inherently sorted and doesn't need any "heap sorting."
It should be noted that building the k-d tree will take O(n log n), and is only worth the cost if you need to do repeated nearest neighbor searches on the structure. If you only need to perform one search to find k nearest neighbors from the origin, it can be done with a naive O(n) search.
How to build a k-d tree, straight from Wiki:
One adds a new point to a k-d tree in the same way as one adds an element to any other search tree. First, traverse the tree, starting from the root and moving to either the left or the right child depending on whether the point to be inserted is on the "left" or "right" side of the splitting plane. Once you get to the node under which the child should be located, add the new point as either the left or right child of the leaf node, again depending on which side of the node's splitting plane contains the new node.
Adding points in this manner can cause the tree to become unbalanced, leading to decreased tree performance. The rate of tree performance degradation is dependent upon the spatial distribution of tree points being added, and the number of points added in relation to the tree size. If a tree becomes too unbalanced, it may need to be re-balanced to restore the performance of queries that rely on the tree balancing, such as nearest neighbour searching.
Once have have built the tree, you can find k nearest neighbors to some point (the origin in your case) in O(k log n) time.
Straight from Wiki:
Searching for a nearest neighbour in a k-d tree proceeds as follows:
Starting with the root node, the algorithm moves down the tree recursively, in the same way that it would if the search point were being inserted (i.e. it goes left or right depending on whether the point is lesser than or greater than the current node in the split dimension).
Once the algorithm reaches a leaf node, it saves that node point as the "current best"
The algorithm unwinds the recursion of the tree, performing the following steps at each node:
If the current node is closer than the current best, then it becomes the current best.
The algorithm checks whether there could be any points on the other side of the splitting plane that are closer to the search point than the current best. In concept, this is done by intersecting the splitting hyperplane with a hypersphere around the search point that has a radius equal to the current nearest distance. Since the hyperplanes are all axis-aligned this is implemented as a simple comparison to see whether the difference between the splitting coordinate of the search point and current node is lesser than the distance (overall coordinates) from the search point to the current best.
If the hypersphere crosses the plane, there could be nearer points on the other side of the plane, so the algorithm must move down the other branch of the tree from the current node looking for closer points, following the same recursive process as the entire search.
If the hypersphere doesn't intersect the splitting plane, then the algorithm continues walking up the tree, and the entire branch on the other side of that node is eliminated.
When the algorithm finishes this process for the root node, then the search is complete.
This is a pretty tricky algorithm that I would hate to need to describe as an interview question! Fortunately the general case here is more complex than is needed, as you pointed out in your post. But I believe this approach may be close to what your (wrong) sample answer was trying to describe.

Fast way of determining whether two nodes in a JUNG graph are connected by more than one path?

Given two nodes A and B from a directed JUNG graph, I want to determine whether there is more than one path from A to B (not necessarely a shortest path).
I can think of two approaches only, both very time-consuming.
Retrieve all paths connecting the two nodes (question Finding all
paths in JUNG?) and check if there is more than one.
Retrieve the shortest path by using the class
DijkstraShortestPath, then break this path and search for the
shortest path again. If there is still one, it means there were
multiple paths.
Note that this also requires to clone the graph, since I do not want to alter the original graph.
How can I do this smarter (i.e. faster)?

I found a solution myself.
My problem has the additional constraint that I only want to check whether there is more than one path only for two nodes that are directly connected with and edge. This means that by simply computing the shortest path you will always get this single edge as path.
So, my question can be reformulated as:
Is there another path connecting the two nodes of an edge, aside from the edge itself?
The solution is to use a weighted shortest path. If we assign a very high weight to our edge of interest, and weight 1 to all the others, then if the minimal distance is lower than our high weight, the answer is YES, otherwise NO.
Here is the code:
public static boolean areThereMultiplePaths(final Edge edge, DirectedGraph<Entity, Edge> graph) {
Transformer<Edge, Integer> transformer = new Transformer<Edge, Integer>() {
public Integer transform(Edge otherEdge) {
if (otherEdge.equals(edge))
return Integer.MAX_VALUE;
else
return 1;
}
};
DijkstraShortestPath<Entity, Edge> algorithm = new DijkstraShortestPath<Entity, Edge>(graph, transformer);
Double distance = (Double) algorithm.getDistance(edge.getStartNode(), edge.getEndNode());
return distance < Integer.MAX_VALUE;
}

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.