The A star algorithm is known to be complete. However, all the implementations that I have found searching the web seem to only return only the first (optimal) solution.
For example, this implementation:
A star algoritthm implementation
Since the algorithm always expands the node with the minimum f value, and the implementations seem to stop when the first node is a solution, how would one adapt the aforementioned code so as to output all (or the first n) paths that lead to a goal, without taking into account duplicate actions (that is, paths that contain the same action over and over again)?
For all paths, it probably makes a lot more sense to use breath first search. Alternatively, you can try Dijkstra's algorithm if you want to find the top n shortest paths.
It's complete which means it will find a solution if one exists, but the algorithm specifically only returns one path. A breadth-first search will find all non-cyclical paths between two nodes, however: http://en.wikipedia.org/wiki/Breadth-first_search
Update - Here is the k-shortest paths algorithm which will return a list of n (or in this case, k) shortest paths in order of shortest to longest. http://code.google.com/p/k-shortest-paths/
Related
I've got a dataset consisting of nodes and edges.
The nodes respresent people and the edges represent their relations, which each has a cost that's been calculated using euclidean distance.
Now I wish to match these nodes together through their respective edges, where there is only one constraint:
Any node can only be matched with a single other node.
Now from this we know that I'm working in a general graph, where every node could theoretically be matched with any node in the dataset, as long as there is an edge between them.
What I wish to do, is find the solution with the maximum matches and the overall minimum cost.
Node A
Node B
Node C
Node D
- Edge 1:
Start: End Cost
Node A Node B 0.5
- Edge 2:
Start: End Cost
Node B Node C 1
- Edge 3:
Start: End Cost
Node C Node D 0.5
- Edge 2:
Start: End Cost
Node D Node A 1
The solution to this problem, would be the following:
Assign Edge 1 and Edge 3, as that is the maximum amount of matches ( in this case, there's obviously only 2 solutions, but there could be tons of branching edges to other nodes)
Edge 1 and Edge 3 is assigned, because it's the solution with maximum amount of matches and the minimum overall cost (1)
I've looked into quite a few algorithms including Hungarian, Blossom, Minimal-cost flow, but I'm uncertain which is the best for this case. Also there seems so be an awful lot of material to solving these kinds of problems in bipartial graph's, which isn't really the case in this matter.
So I ask you:
Which algorithm would be the best in this scenario to return the (a) maximum amount of matches and (b) with the lowest overall cost.
Do you know of any good material (maybe some easy-to-understand pseudocode), for your recomended algorithm? I'm not the strongest in mathematical notation.
For (a), the most suitable algorithm (there are theoretically faster ones, but they're more difficult to understand) would be Edmonds' Blossom algorithm. Unfortunately it is quite complicated, but I'll try to explain the basis as best I can.
The basic idea is to take a matching, and continually improve it (increase the number of matched nodes) by making some local changes. The key concept is an alternating path: a path from an unmatched node to another unmatched node, with the property that the edges alternate between being in the matching, and being outside it.
If you have an alternating path, then you can increase the size of the matching by one by flipping the state (whether or not they are in the matching) of the edges in the alternating path.
If there exists an alternating path, then the matching is not maximum (since the path gives you a way to increase the size of the matching) and conversely, you can show that if there is no alternating path, then the matching is maximum. So, to find a maximum matching, all you need to be able to do is find an alternating path.
In bipartite graphs, this is very easy to do (it can be done with DFS). In general graphs this is more complicated, and this is were Edmonds' Blossom algorithm comes in. Roughly speaking:
Build a new graph, where there is an edge between two vertices if you can get from u to v by first traversing an edge that is in the matching, and then traversing and edge that isn't.
In this graph, try to find a path from an unmatched vertex to a matched vertex that has an unmatched neighbor (that is, a neighbor in the original graph).
Each edge in the path you find corresponds to two edges of the original graph (namely an edge in the matching and one not in the matching), so the path translates to an alternating walk in the new graph, but this is not necessarily an alternating path (the distinction between path and walk is that a path only uses each vertex once, but a walk can use each vertex multiple times).
If the walk is a path, you have an alternating path and are done.
If not, then the walk uses some vertex more than once. You can remove the part of the walk between the two visits to this vertex, and you obtain a new graph (with part of the vertices removed). In this new graph you have to do the whole search again, and if you find an alternating path in the new graph you can "lift" it to an alternating path for the original graph.
Going into the details of this (crucial) last step would be a bit too much for a stackoverflow answer, but you can find more details on Wikipedia and perhaps having this high-level overview helps you understand the more mathematical articles.
Implementing this from scratch will be quite challenging.
For the weighted version (with the Euclidean distance), there is an even more complicated variant of Edmonds' Algorithm that can handle weights. Kolmogorov offers a C++ implementation and accompanying paper. This can also be used for the unweighted case, so using this implementation might be a good idea (even if it is not in java, there should be some way to interface with it).
Since your weights are based on Euclidean distances there might be a specialized algorithm for that case, but the more general version I mentioned above would also work and and implementation is available for it.
I implemented a C# Dijkstra with Fibonacci Heap using based on SO question and this version in Java, which is very clean, concise and well documented.
I modified the DirectedGraph to make it an undirected graph.
However, I have 2 questions regarding the search algorithm itself:
The current method has 2 parameters (graph & source). If I add a third parameter (target), what are the changes necessary in the search algorithm itself so that it only searches from source to target instead of all-pairs shortest paths?
The function returns a list of distances. What do I need to change to make it return the shortest path?
I have to make a Java program which finds all repeating sub-strings of length n in a given String. The input is string is extremely long and a brute-force approach takes too much time.
I alread tried:
Presently I am finding each sub-string separately and checking for repetitions of that sub-string using the KMP alogrithm. This too is taking too much time.
What is a more efficient approach for this problem?
1) You should look at using a suffix tree data structure.
Suffix Tree
This data structure can be built in O(N * log N) time
(I think even in O(N) time using Ukkonen's algorithm)
where N is the size/length of the input string.
It then allows for solving many (otherwise) difficult
tasks in O(M) time where M is the size/length of the pattern.
So even though I didn't try your particular problem, I am pretty sure that
if you use a suffix tree and a smart formulation of your problem, then the
problem can be solved by using a suffix tree (in reasonable O time).
2) A very good book on these (and related) subjects is this one:
Algorithms on Strings, Trees and Sequences
It's not really easy to read though unless you're well-trained in algorithms.
But OK, reading such things is the only way to get well-trained ;)
3) I suggest you have a quick look at this algorithm too.
Aho-Corasick Algorithm
Even though, I am not sure but... this one might be somewhat
off-topic with respect to your particular problem.
I am going to take #peter.petrov's suggestion and enhance it by explaining how can one actually use a suffix tree to solve the problem:
1. Create a suffix tree from the string, let it be `T`.
2. Find all nodes of depth `n` in the tree, let that set of nodes be `S`. This can be done using DFS, for example.
3. For each node `n` in `S`, do the following:
3.1. Do a DFS, and count the number of terminals `n` leads to. Let this number be `count`
3.2. If `count>1`, yield the substring that is related to `n` (the path from root to `n`), and `count`
Note that this algorithm treats any substring of length n and add it to the set S, and from there it search for how many times this was actually a substring by counting the number of terminals this substring leads to.
This means that the complexity of the problem is O(Creation + Traversal) - meaning, you first create the tree and then you traverse it (easy to see you don't traverse in steps 2-3 each node in the tree more than once). Since the traversal is obviously "faster" than creation of the tree - it leaves you with O(Creation), which as was pointed by #perer.petrov is O(|S|) or O(|S|log|S|) depending on your algorithm of choice.
I know this is exponential. I already implemented a method to find the shortest path using Dijkstra's algorithm. Is it possible to modify the method to find the longest path instead? If I make all the weights negative, shouldn't this work. All the weights on my current graph are positive. Also there should be no repeating paths.
I know the Bellman Ford algorithm works with negative weights, but im hoping I can just modify my existing shortest path method.
If the graph is undirected, then the longest path has infinite length, because you can visit an edge forward and backward as many times as you want. Therefore you should put some more conditions, like : a node can only be visited once, or the graph must be directed.
Making all weights negative and running Dijkstra will make infinite loop. It is in fact equivalent to what I just explained above.
For more information, I invite you to read about these :
http://en.wikipedia.org/wiki/Topological_sorting
http://en.wikipedia.org/wiki/Travelling_salesman_problem
Good luck !
I need to find a shortest path through an undirected graph whose nodes are real (positive and negative) weighted. These weights are like resources which you can gain or loose by entering the node.
The total cost (resource sum) of the path isn't very important, but it must be more than 0, and length has to be the shortest possible.
For example consider a graph like so:
A-start node; D-end node
A(+10)--B( 0 )--C(-5 )
\ | /
\ | /
D(-5 )--E(-5 )--F(+10)
The shortest path would be A-E-F-E-D
Dijkstra's algorithm alone doesn't do the trick, because it can't handle negative values. So, I thought about a few solutions:
First one uses Dijkstra's algorithm to calculate the length of a shortest path from each node to the exit node, not considering the weights. This can be used like some sort of heuristics value like in A*. I'm not sure if this solution could work, and also it's very costly. I also thought about implement Floyd–Warshall's algorithm, but I'm not sure how.
Another solution was to calculate the shortest path with Dijkstra's algorithm not considering the weights, and if after calculating the path's resource sum it's less than zero, go through each node to find a neighbouring node which could quickly increase the resource sum, and add it to the path(several times if needed). This solution won't work if there is a node that could be enough to increase the resource sum, but farther away from the calculated shortest path.
For example:
A- start node; E- end node
A(+10)--B(-5 )--C(+40)
\
D(-5 )--E(-5 )
Could You help me solve this problem?
EDIT: If when calculating the shortest path, you reach a point where the sum of the resources is equal to zero, that path is not valid, since you can't go on if there's no more petrol.
Edit: I didn't read the question well enough; the problem is more advanced than a regular single-source shortest path problem. I'm leaving this post up for now just to give you another algorithm that you might find useful.
The Bellman-Ford algorithm solves the single-source shortest-path problem, even in the presence of edges with negative weight. However, it does not handle negative cycles (a circular path in the graph whose weight sum is negative). If your graph contains negative cycles, you are probably in trouble, because I believe that that makes the problem NP-complete (because it corresponds to the longest simple path problem).
This doesn't seem like an elegant solution, but given the ability to create cyclic paths I don't see a way around it. But I would just solve it iteratively. Using the second example - Start with a point at A, give it A's value. Move one 'turn' - now I have two points, one at B with a value of 5, and one at D also with a value of 5. Move again - now I have 4 points to track. C: 45, A: 15, A: 15, and E: 0. It might be that the one at E can oscillate and become valid so we can't toss it out yet. Move and accumulate, etc. The first time you reach the end node with a positive value you are done (though there may be additional equivalent paths that come in on the same turn)
Obviously problematic in that the number of points to track will rise pretty quickly, and I assume your actual graph is much more complex than the example.
I would do it similarly to what Mikeb suggested: do a breadth-first search over the graph of possible states, i.e. (position, fuel-left)-pairs.
Using your example graph:
Octagons: Ran out of fuel
Boxes: Child nodes omitted for space reasons
Searching this graph breadth-first is guaranteed to give you the shortest route that actually reaches the goal if such a route exists. If it does not, you will have to give up after a while (after x nodes searched, or maybe when you reach a node with a score greater than the absolute value of all negative scores combined), as the graph can contain infinite loops.
You have to make sure not to abort immediately on finding the goal if you want to find the cheapest path (fuel wise) too, because you might find more than one path of the same length, but with different costs.
Try adding the absolute value of the minimun weight (in this case 5) to all weights. That will avoid negative ciclic paths
Current shortest path algorithms requires calculate shortest path to every node because it goes combinating solutions on some nodes that will help adjusting shortest path in other nodes. No way to make it only for one node.
Good luck