I'd like to find an implementation of an approximate algorithm for the Minimum Feedback Arc Set in Java but I did not find anything so far. Does anyone have something in mind?
It appears that the simplest approximate algorithm one can implement (but with no minimality guarantees) is the one of this paper:
A fast and effective heuristic for the feedback arc set problem, by P. Eades, X. Lin, W.F. Smyth.
It is very easy to implement and works quite fast for large graphs (I tried it on a graph of 2.5 million edges and around 100 thousand nodes and broke all cycles in less than a minute).
Related
I need some advise in an approach I may need to take to solve a gaming problem which is a puzzle (NxN), the puzzle consists of positive numbers and stored in a two dimensional array. For simplistic reasons is i´ll list a simple example
2 1 2 2
1 3 2 1
1 0 2 1
3 1 2 0
So the starting point is at (0,0) => 2 and the goal location is to (3,3) => 0
The number in the array location tells you how far to move. (0,0)=> 2 can move to either (0,2) or (2,0) and so on (moves allowed left, right, up, or down)
So you end up with a solution like this for example (0,2)=>(2,2)=>(2,0)=>(3,0)=>(3,3).
so my question is what sort of algorithm i should be looking into and whether any of you have done something similar to this?
You have plenty of solutions here:
A* algorithm
Dijkstra
Depth-first
Breadth-first
The first two will give you an optimal solution if one exists. A* is typically faster than Dijkstra if the heuristic is well chosen. Breadth- first will also give you an optimal implementation. Depth-first may give you non-optimal solutions in this problem.
The main difference between A* and Djisktra is that A* defines a heuristic, namely a function that tries to estimate if a move is better than another one.
The main difference between depth-first and breadth-first is the order in which they explore the space of solutions. Breadth-first will start by looking for all solutions of length 1 then all solutions of length 2, etc, while depth-first will fully explore an entire path until it either cannot go any further or finds a solution.
A* and Dijkstra are typically implemented in imperative style and are probably more sophisticated than the other two, especially A*. Breadth-first is also naturally expressed in imperative style. Depth-first is generally expressed recursively, which can be a problem if your solutions can exceed a length of several thousands moves (depending on the size of your stack, you will generally only be able to make 7-10k recursive calls before you get a StackOverflowError).
To sum up:
A* is generally the most efficient of the algorithms listed below
A* is the most difficult to implement
Dijkstra is a special case of A* with similar performance but potentially less efficient
Breadth-first is straightforward to implement and is resilient to long solutions
Depth-first is straightforward to implement but it is limited by the length of the longest possible path if it is implemented recursively
All these algorithms except depth-first guarantee an optimal solution
Code example:
I found this Scala implementation of A* in one of my repositories. Might help.
I have some grid search algorithms (Best-First, Breadth-First, Depth-First) implemented here in Object Pascal (Delphi), that you could easily adapt to Java if this was a classic grid search:
https://github.com/Zoomicon/GridSearchDemo/tree/master/Object%20Pascal/algorithms
You can try the GridSearchDemo application here to see how those algorithms behave when searching in a grid with start and target point and obstacles in various grid cells (you can set them):
https://github.com/Zoomicon/GridSearchDemo/releases
In general, I prefer the A* algorithm, which is an example of a Best-First algorithm (https://en.wikipedia.org/wiki/Best-first_search)
In your case, this is not a grid really, but a graph, since you seem to have jump links to other cells (or at least this is how you explain the number in your question, although you call it "how far" at first)
I have written a program in java to solve this problem. It uses A*-algorithm with heuristic functions Manhatten and hamming. It depends on the person whether he uses hamming or manhatten distance but Manhatten is better.
Here is my code in java: 8-puzzle
Btw it's not an easy approach and many problems can't be solved.
I'm trying to compare multiple algorithms that are used to smooth GPS data. I'm wondering what should be the standard way to compare the results to see which one provides better smoothing.
I was thinking on a machine learning approach. To crate a car model based on a classifier and check on which tracks provides better behaviour.
For the guys who have more experience on this stuff, is this a good approach? Are there other ways to do this?
Generally, there is no universally valid way for comparing two datasets, since it completely depends on the applied/required quality criterion.
For your appoach
I was thinking on a machine learning approach. To crate a car model
based on a classifier and check on which tracks provides better
behaviour.
this means that you will need to define your term "better behavior" mathematically.
One possible quality criterion for your application is as follows (it consists of two parts that express opposing quality aspects):
First part (deviation from raw data): Compute the RMSE (root mean squared error) between the smoothed data and the raw data. This gives you a measure for the deviation of your smoothed track from the given raw coordinates. This means, that the error (RMSE) increases, if you are smoothing more. And it decreases if you are smoothing less.
Second part (track smoothness): Compute the mean absolute lateral acceleration that the car will experience along the track (second deviation). This will decrease if you are smoothing more, and it will increase if you are smoothing less. I.e., it behaves in contrary to the RMSE.
Result evaluation:
(1) Find a sequence of your data where you know that the underlying GPS track is a straight line or where the tracked object is not moving. Note, that for those tracks, the (lateral) acceleration is zero by definition(!).
For these, compute RMSE and mean absolute lateral acceleration.
The RMSE of appoaches that have (almost) zero acceleration results from measurement inaccuracies!
(2) Plot the results in a coordinate system with the RMSE on the x axis and the mean acceleration on the y axis.
(3) Pick all approaches that have an RMSE similar to what you found in step (1).
(4) From those approaches, pick the one(s) with the smallest acceleration. Those give you the smoothest track with an error explained through measurement inaccuracies!
(5) You're done :)
I have no experience on this topic but I have few things in mind that may help you.
You know it is a car. You know that the data is generated from a car so you can define a set of properties of a car. For example if a car is moving with speed above 50km than the angle of the corner should be at least 110 degrees. I am absolutely guessing with the values but if you do a little research i am sure you will be able to define such properties. Next thing you can do is to test how each approximation fits the car properties and choose the best one.
Raw data. I assume you are testing all methods on a part of given road. You can generate a "raw gps track" - a track that best fits the movement of a car. Google maps may help you to generate such track os some gps devise with higher accuracy. Than you measure the distance between each approximation and your generated track - the one with the min distance wins.
i think you easily match the coordinates after the address conversion.
because address have street,area and city. so you can easily match the different radius.
let try this link
Take a look at this paper that discusses comparing machine learning algorithms:
"Choosing between two learning algorithms
based on calibrated tests" available at:
http://www.cs.waikato.ac.nz/ml/publications/2003/bouckaert-calibrated-tests.pdf
Also check out this paper:
"Bayesian Comparison of Machine Learning Algorithms on Single and
Multiple Datasets" available at:
http://www.jmlr.org/proceedings/papers/v22/lacoste12/lacoste12.pdf
Note: It is noted from the question that you are looking into the best way to compare the results for machine learning algorithms and are not looking for additional machine learning algorithms that may implement this feature.
Machine Learning is not an well suited approach for that task, you would have to define what is good smoothing...
Principially your task cannot be solved by an algorithm that gives an general answer because every smoothing destroy the original data by some amount and adds invented positions, and different systems/humans that use the smoothed data react differently on that changed data.
The question is: What do you want to achieve with smoothing?
Why do you need smoothing? (have you forgotten to implement or enable a stand still filter that eliminates movement while the vehicle is standing still, which in GPS introduces jumping location during stand still?)
The GPS chip has already built in a (best possible?) real time smoothing using a Kalman filter, having on the one side more information than a post processed smotthing algo, on the other side it has less.
So next you have to ask yourself: do you compare post processing smooting algos or real time algos? (probably post processing) Comparing a real time smoothing algorithm with a post process smoothing algorithm is not fair.
Again: What do you expect from smoothed data: That they look somewhat fine, but unrealistic like photoshopped models for tv-advertisments?
What is good smoothing? near to real vehicle postion which nobody ever knows, or a curve whith low acceleration?
I would prefer an smoothing algorithm that produces the curve most near to the real (usually unknown) vehicle trajectory.
Or you might just think it should somehow look beautifull: In that case overlay the curves with different colors, display it on a satelitte image map, and let a team of humans (experts at least owning and driving an own car) decide what looks good and realistic.
We humans have the best multi purpose pattern matching algorithm built in.
Again why smooth?: for display in a map to please humans that look at that map?
or to use the smoothed tracks to feed other algorithms that have problems with the original data?
To please humans I have given an answer above.
To please other algorithms:
What they need? nearer positions? or better course value / direction between points.
What attributes do you want to smooth: only the latitude, longitude coordinates, or also the speed value, and course value?
I have much professional experience with GPS tracks, and recommend, to just remove every location under 7km/h and keep the rest as it is. In most cases there is no need for further smoothing.
Otherwise it gets expensive:
A possible solution:
1) You arrange a 2000€ Reference GPS receiver delivered with a magnetic vehicle roof antenna (E.g Company hemisphere 2000 GPS receiver) and use that as reference
2) You use a comnsumer GPS usually used for your task (smartphone, etc.)
Both mounted inside the car: drive some test tracks, in good conditions (highways) but more tracks at very bad: strong curves combined with big houses left and right. And through tunnel, a struight and a curved one, if you have one.
3) apply the smoothing algoritms to the consumer GPS tracks
4) compare the smoothed to the reference track, by matching two positions and finally calulate the (RMSE Root mean squared error)
Difficulties
matching two positions: Hopefully the time can be exactly matched which is usually not the case (0,5s offset possible).
Think what do you do when having an GPS outage.
Consider first to display a raw track and identify what kind of unsmoothed data is not suitable/ nice looking. (Probably later posting the pics here)
what about using the good old Kalman Filter!
Is there an algorithm that can tell you what points to connect to form a triangle given a set of points? None of the connecting lines can intersect, however triangles can be inside of other triangles.
Given a general set of points in R^d the Delaunay triangulation is often an optimal choice for tessellation.
Specifically, the Delaunay triangulation will tessellate the convex hull of the point set into a set of non-overlapping elements, ensuring that the radius of the largest circumsphere is minimised - this means that the triangulation is optimal in terms of its "compactness", or in other words, elements with good aspect ratio are generated.
Efficient algorithms to construct Delaunay triangulations are not trivial, but there are a number of good libraries out there - I can recommend Triangle, CGAL or Qhull (for high dimensional problems) also JDT is apparently an implementation in Java, though I've never used it.
I am not sure it is exactly what you are looking for, but it may be of some help: Graph Theory
I am also attempting to solve this problem. This is a link to the github branch of someone who works on this for the game Ingress, which is why I'm interested in the solution. However, to my knowledge the optimal solution is found through brute force (I may be wrong on this), and has other factors it maximizes and minimizes. Also I think there are things such as taking in an E6 latitude/longitude and projects onto a Gnomonic projection to determine shortest routes, however I think this can be discounted when going through the code. I don't think there is your solution in this code, but it might be a good jumping off point for you, me, and anyone else looking into this problem.
I've created two clustering algorithms: k-means and divisive, maybe later I'll add aglomerative as well. I have to analyze how good they are with high dimension data, and for that I have to calculate the average/sum distance to the clusters center. In the case of k-means, it's easy, i have the centroid, but how to find the center in the divisive/aglomerative algorithm?
While I'm here: I've currently implemented Euclede's, Manhattans and Pearsons distance, are there any more distance measures which i could use?
Thanks in advance!
You may want to get this book:
Encyclopedia of distances, Michel Deza, Elena Deza, 590 pages.
which covers many of the alternate distance functions you could use.
Probably a few hundred different distances ...
However, you will also need to look into your evaluation method -- if it is centroid based, it will be biased towards k-means. So the comparison is likely unfair.
Furthermore, if you use artificial data, make sure you do not unfairly favor one method over another because the method correlates with the way you generate your data (e.g. if you generate Gaussian clusters, it favors methods such as k-means).
The goal of my work is to analyze these clusters, when they have to create clusters from data with high dimensionality. It is hard to evaluate them and it's very unlikely that the result will be completely fair, so I'm going to use the average, accumulated distance between records in one cluster and the minimal distance between two records from different clusters.
Regarding the way on how to find the center of a cluster in Hierarchical clustering algorithms - the same formula used in k-means, used to recalculate the centroid after each iteration.
I have about hundred points, that I want to approximate with Bezier curve, but if there are more than 25 points (or something like that), factorial counting in number of combination causes number overflow.
Is there a way of approximating such amount of points in a Bezier-like way (smooth curve without passing through all points, except first and last)?
Or do I need to choose another approximation algorithm with the same effect?
I'm using default swing drawing tools.
P.S. English is not native for me, so probably I've used wrong math terms somewhere.
Do you want to get one Bezier curve fitting best in all 100 points? If that is the case Jim Herold has a very detailed explanation how to do it. A further optimisation could be reducing the amount of points using the Douglas-Peucker algorithm.