I'm trying to compare multiple algorithms that are used to smooth GPS data. I'm wondering what should be the standard way to compare the results to see which one provides better smoothing.
I was thinking on a machine learning approach. To crate a car model based on a classifier and check on which tracks provides better behaviour.
For the guys who have more experience on this stuff, is this a good approach? Are there other ways to do this?
Generally, there is no universally valid way for comparing two datasets, since it completely depends on the applied/required quality criterion.
For your appoach
I was thinking on a machine learning approach. To crate a car model
based on a classifier and check on which tracks provides better
behaviour.
this means that you will need to define your term "better behavior" mathematically.
One possible quality criterion for your application is as follows (it consists of two parts that express opposing quality aspects):
First part (deviation from raw data): Compute the RMSE (root mean squared error) between the smoothed data and the raw data. This gives you a measure for the deviation of your smoothed track from the given raw coordinates. This means, that the error (RMSE) increases, if you are smoothing more. And it decreases if you are smoothing less.
Second part (track smoothness): Compute the mean absolute lateral acceleration that the car will experience along the track (second deviation). This will decrease if you are smoothing more, and it will increase if you are smoothing less. I.e., it behaves in contrary to the RMSE.
Result evaluation:
(1) Find a sequence of your data where you know that the underlying GPS track is a straight line or where the tracked object is not moving. Note, that for those tracks, the (lateral) acceleration is zero by definition(!).
For these, compute RMSE and mean absolute lateral acceleration.
The RMSE of appoaches that have (almost) zero acceleration results from measurement inaccuracies!
(2) Plot the results in a coordinate system with the RMSE on the x axis and the mean acceleration on the y axis.
(3) Pick all approaches that have an RMSE similar to what you found in step (1).
(4) From those approaches, pick the one(s) with the smallest acceleration. Those give you the smoothest track with an error explained through measurement inaccuracies!
(5) You're done :)
I have no experience on this topic but I have few things in mind that may help you.
You know it is a car. You know that the data is generated from a car so you can define a set of properties of a car. For example if a car is moving with speed above 50km than the angle of the corner should be at least 110 degrees. I am absolutely guessing with the values but if you do a little research i am sure you will be able to define such properties. Next thing you can do is to test how each approximation fits the car properties and choose the best one.
Raw data. I assume you are testing all methods on a part of given road. You can generate a "raw gps track" - a track that best fits the movement of a car. Google maps may help you to generate such track os some gps devise with higher accuracy. Than you measure the distance between each approximation and your generated track - the one with the min distance wins.
i think you easily match the coordinates after the address conversion.
because address have street,area and city. so you can easily match the different radius.
let try this link
Take a look at this paper that discusses comparing machine learning algorithms:
"Choosing between two learning algorithms
based on calibrated tests" available at:
http://www.cs.waikato.ac.nz/ml/publications/2003/bouckaert-calibrated-tests.pdf
Also check out this paper:
"Bayesian Comparison of Machine Learning Algorithms on Single and
Multiple Datasets" available at:
http://www.jmlr.org/proceedings/papers/v22/lacoste12/lacoste12.pdf
Note: It is noted from the question that you are looking into the best way to compare the results for machine learning algorithms and are not looking for additional machine learning algorithms that may implement this feature.
Machine Learning is not an well suited approach for that task, you would have to define what is good smoothing...
Principially your task cannot be solved by an algorithm that gives an general answer because every smoothing destroy the original data by some amount and adds invented positions, and different systems/humans that use the smoothed data react differently on that changed data.
The question is: What do you want to achieve with smoothing?
Why do you need smoothing? (have you forgotten to implement or enable a stand still filter that eliminates movement while the vehicle is standing still, which in GPS introduces jumping location during stand still?)
The GPS chip has already built in a (best possible?) real time smoothing using a Kalman filter, having on the one side more information than a post processed smotthing algo, on the other side it has less.
So next you have to ask yourself: do you compare post processing smooting algos or real time algos? (probably post processing) Comparing a real time smoothing algorithm with a post process smoothing algorithm is not fair.
Again: What do you expect from smoothed data: That they look somewhat fine, but unrealistic like photoshopped models for tv-advertisments?
What is good smoothing? near to real vehicle postion which nobody ever knows, or a curve whith low acceleration?
I would prefer an smoothing algorithm that produces the curve most near to the real (usually unknown) vehicle trajectory.
Or you might just think it should somehow look beautifull: In that case overlay the curves with different colors, display it on a satelitte image map, and let a team of humans (experts at least owning and driving an own car) decide what looks good and realistic.
We humans have the best multi purpose pattern matching algorithm built in.
Again why smooth?: for display in a map to please humans that look at that map?
or to use the smoothed tracks to feed other algorithms that have problems with the original data?
To please humans I have given an answer above.
To please other algorithms:
What they need? nearer positions? or better course value / direction between points.
What attributes do you want to smooth: only the latitude, longitude coordinates, or also the speed value, and course value?
I have much professional experience with GPS tracks, and recommend, to just remove every location under 7km/h and keep the rest as it is. In most cases there is no need for further smoothing.
Otherwise it gets expensive:
A possible solution:
1) You arrange a 2000€ Reference GPS receiver delivered with a magnetic vehicle roof antenna (E.g Company hemisphere 2000 GPS receiver) and use that as reference
2) You use a comnsumer GPS usually used for your task (smartphone, etc.)
Both mounted inside the car: drive some test tracks, in good conditions (highways) but more tracks at very bad: strong curves combined with big houses left and right. And through tunnel, a struight and a curved one, if you have one.
3) apply the smoothing algoritms to the consumer GPS tracks
4) compare the smoothed to the reference track, by matching two positions and finally calulate the (RMSE Root mean squared error)
Difficulties
matching two positions: Hopefully the time can be exactly matched which is usually not the case (0,5s offset possible).
Think what do you do when having an GPS outage.
Consider first to display a raw track and identify what kind of unsmoothed data is not suitable/ nice looking. (Probably later posting the pics here)
what about using the good old Kalman Filter!
Related
In my naive beginning Android mind I thought the way to do this would be to loop through each of the objects checking if proximity falls within X range and if so, include the object. This is being done with Google Maps and GeoPoints.
That said, I know this is probably the slowest way possibly. I did a search for Android Proxmity algorithm's and did not get much really. What I am looking for is best options with regard to this the more efficiently.
Are there any libraries I have not been able to find?
If not, should I load these Location objects into SQL then go from there or keep them in a JSONArray?
Once I establish my best datastructure, what is he best method to find all Locations located with X miles of user?
I am not asking for cut and paste code, rather the best method to this efficiently. Then, I can stumble through the code :)
My first gut feeling is to group the Locations by regions but I'm not exactly sure how to do this.
I could potentially have tens of thousands of datapoints.
Any help in simply heading in the right direction is greatly appreciated.
As a side note, I reach this juncture after discovering that a remote API I had been using was.. well.. just PLAIN WRONG and ommiting datapoints from my proximity search. I also realized that if just placed on the datapoints on the phone, then I could allow the user to run the App without internet connection, and only GPS and this would be a HUGE plus. So, with all setbacks come opportunnities!
The answer depends on the representation of the GeoPoints: If these are not sorted you need to scan all of them (this is done in linear time, sorting wrt. distance or clustering will be more expensive). Use Location.distanceTo(Location) or Location.distanceBetween(float, float, float, float, float[]) to calculate the distances.
If the GeoPoints were sorted wrt. distance to your position this task can be done much more efficiently, but since the supplier does not know your position, I assume that this cannot be done.
If the GeoPoints are clustered, i.e. if you have a set of clusters with some center and a radius select each cluster where the distance from your position to the cluster's center is within the limit plus the radius. For these clusters you need to check each GeoPoint contained in the cluster (some of them are possibly farther away from your position than the limit allows). Alternatively you might accept the error and include all points of the cluster (if the radius is relatively small I would recommend this).
I need to get the nearest city out of a selected set of cities.
Our company has a list of subsidiaries (some 100 in my country). We get around 3000 requests a day. This requests should be assigned to the subsidiaries (by geographical distance).
Is there an API to do this?
The best would be a (java) GoogleMaps API or similar webservice.
Best Regards,
Christian.
What I would do is to construct a Voronoi diagram of your subsidiaries, based on geographical distance and store that diagram in the form that can be used in your code. Then, look for the containing cell for each request and that will tell you which subsidiary is the closest one.
If you really want to make it precise, you could use OSM's road network to construct the diagram based on the driving distance, not simply geographical one.
Get the coordinates from Nominatim, it should be straight forward to make requests from a java application.
Calculate the great circle distance for every city to every city. I have to admit that the result might by an 300 by 300? array. However it may contain only integer. Hold it in memory for future requests.
Find the entry with the lowest number in the row or column.
Bit of an old question and maybe too late an answer for you.
A good approximation for speed concerns where absolute precision is not of the essence is to draw a rectangle around a point (the one you need to find the closest subsidiaries here). That rectangle would natively have NE and SW coordinate boundaries (or NW/SE).
To find the closest subsidiaries, ones need to find all of which the NE coordinate is "less" than that of the rectangle and "more" than that of the SW boundary.
I am quoting "more" and "less" because they might mean differently based of where on earth you are.
I wrote https://github.com/grumlimited/geocalc for my own need a few years back. Take a look at the section about names "BoundingArea".
Please consider this scenario:
An app knows which of a few routes the phone is on, thanks to GPS. That means it knows the only two directions that the device will be traveling in.
Am I right in thinking that the best way to determine which direction the phone is moving (it will almost certainly not be pointing the right way, so compass is not an option) is to poll the GPS until it starts moving, and find the direction the Co-Ords are moving in?
How regularly, and for how long, do you suggest the polling polls/lasts for?
Thanks in advance!
This is a science in itself, read up on kalman filtering. Basically, the difference between the last two points given by the GPS is the direction you are moving. Then errors come into the equation and you need to start learning about good ways to filter the data and get better results.
Attempt at explaining kalman filtering:
A kalman filter uses a Model to predict new values for the predicted thing. It makes an assumption like "stuff usually moves in the direction of their speed. so if it was here a second ago, it will be there now". It will then use this model to predict the next point and when it actually can measure the next point, it will use that data to update the model and assess the accuracy of the prediction. Then it will start giving you predictions based on a combination of real data and predictions which are weighted according to it's measurements of the prediction accuracy. So if the model is normally very accurate but there is suddenly a jump in the data, it will assume that it is a fluke and it will not let it affect the value too much. If the data is very jumpy, it will trust the data more and the model less.
As I have mentioned in previous questions I am writing a maze solving application to help me learn about more theoretical CS subjects, after some trouble I've got a Genetic Algorithm working that can evolve a set of rules (handled by boolean values) in order to find a good solution through a maze.
That being said, the GA alone is okay, but I'd like to beef it up with a Neural Network, even though I have no real working knowledge of Neural Networks (no formal theoretical CS education). After doing a bit of reading on the subject I found that a Neural Network could be used to train a genome in order to improve results. Let's say I have a genome (group of genes), such as
1 0 0 1 0 1 0 1 0 1 1 1 0 0...
How could I use a Neural Network (I'm assuming MLP?) to train and improve my genome?
In addition to this as I know nothing about Neural Networks I've been looking into implementing some form of Reinforcement Learning, using my maze matrix (2 dimensional array), although I'm a bit stuck on what the following algorithm wants from me:
(from http://people.revoledu.com/kardi/tutorial/ReinforcementLearning/Q-Learning-Algorithm.htm)
1. Set parameter , and environment reward matrix R
2. Initialize matrix Q as zero matrix
3. For each episode:
* Select random initial state
* Do while not reach goal state
o Select one among all possible actions for the current state
o Using this possible action, consider to go to the next state
o Get maximum Q value of this next state based on all possible actions
o Compute
o Set the next state as the current state
End Do
End For
The big problem for me is implementing a reward matrix R and what a Q matrix exactly is, and getting the Q value. I use a multi-dimensional array for my maze and enum states for every move. How would this be used in a Q-Learning algorithm?
If someone could help out by explaining what I would need to do to implement the following, preferably in Java although C# would be nice too, possibly with some source code examples it'd be appreciated.
As noted in some comments, your question indeed involves a large set of background knowledge and topics that hardly can be eloquently covered on stackoverflow. However, what we can try here is suggest approaches to get around your problem.
First of all: what does your GA do? I see a set of binary values; what are they? I see them as either:
bad: a sequence of 'turn right' and 'turn left' instructions. Why is this bad? Because you're basically doing a random, brute-force attempt at solving your problem. You're not evolving a genotype: you're refining random guesses.
better: every gene (location in the genome) represents a feature that will be expressed in the phenotype. There should not be a 1-to-1 mapping between genome and phenotype!
Let me give you an example: in our brain there are 10^13ish neurons. But we have only around 10^9 genes (yes, it's not an exact value, bear with me for a second). What does this tell us? That our genotype does not encode every neuron. Our genome encodes the proteins that will then go and make the components of our body.
Hence, evolution works on the genotype directly by selecting features of the phenotype. If I were to have 6 fingers on each hand and if that would made me a better programmer, making me have more kids because I'm more successful in life, well, my genotype would then be selected by evolution because it contains the capability to give me a more fit body (yes, there is a pun there, given the average geekiness-to-reproducibily ratio of most people around here).
Now, think about your GA: what is that you are trying to accomplish? Are you sure that evolving rules would help? In other words -- how would you perform in a maze? What is the most successful thing that can help you: having a different body, or having a memory of the right path to get out? Perhaps you might want to reconsider your genotype and have it encode memorization abilities. Maybe encode in the genotype how much data can be stored, and how fast can your agents access it -- then measure fitness in terms of how fast they get out of the maze.
Another (weaker) approach could be to encode the rules that your agent uses to decide where to go. The take-home message is, encode features that, once expressed, can be selected by fitness.
Now, to the neural network issue. One thing to remember is that NNs are filters. They receive an input. perform operations on it, and return an output. What is this output? Maybe you just need to discriminate a true/false condition; for example, once you feed a maze map to a NN, it can tell you if you can get out from the maze or not. How would you do such a thing? You will need to encode the data properly.
This is the key point about NNs: your input data must be encoded properly. Usually people normalize it, maybe scale it, perhaps you can apply a sigma function to it to avoid values that are too large or too small; those are details that deal with error measures and performance. What you need to understand now is what a NN is, and what you cannot use it for.
To your problem now. You mentioned you want to use NNs as well: what about,
using a neural network to guide the agent, and
using a genetic algorithm to evolve the neural network parameters?
Rephrased like so:
let's suppose you have a robot: your NN is controlling the left and right wheel, and as input it receives the distance of the next wall and how much it has traveled so far (it's just an example)
you start by generating a random genotype
make the genotype into a phenotype: the first gene is the network sensitivity; the second gene encodes the learning ratio; the third gene.. so on and so forth
now that you have a neural network, run the simulation
see how it performs
generate a second random genotype, evolve second NN
see how this second individual performs
get the best individual, then either mutate its genotype or recombinate it with the loser
repeat
there is an excellent reading on the matter here: Inman Harvey Microbial GA.
I hope I did you some insight on such issues. NNs and GA are no silver bullet to solve all problems. In some they can do very much, in others they are just the wrong tool. It's (still!) up to us to get the best one, and to do so we must understand them well.
Have fun in it! It's great to know such things, makes everyday life a bit more entertaining :)
There is probably no 'maze gene' to find,
genetic algorithms are trying to setup a vector of properties and a 'filtering system' to decide by some kind of 'surival of the fittest' algorithm to find out which set of properties would do the best job.
The easiest way to find a way out of a maze is to move always left (or right) along a wall.
The Q-Algorithm seems to have a problem with local maxima this was workaround as I remember by kicking (adding random values to the matrix) if the results didn't improve.
EDIT: As mentioned above a backtracking algorithm suits this task better than GA or NN.
How to combine both algorithm is described here NeuroGen descibes how GA is used for training a NN.
Try using the free open source NerounDotNet C# library for your Neural networks instead of implementing it.
For Reinforcement Learning library, I am currently looking for one, especially for Dot NET framework..
My goal is to recognize simple gestures from accelerometers mounted on a sun spot. A gesture could be as simple as rotating the device or moving the device in several different motions. The device currently only has accelerometers but we are considering adding gyroscopes if it would make it easier/more accurate.
Does anyone have recommendations for how to do this? Any available libraries in Java? Sample projects you recommend I check out? Papers you recommend?
The sun spot is a Java platform to help you make quick prototypes of systems. It is programmed using Java and can relay commands back to a base station attached to a computer. If I need to explain how the hardware works more leave a comment.
The accelerometers will be registering a constant acceleration due to gravity, plus any acceleration the device is subjected to by the user, plus noise.
You will need to low pass filter the samples to get rid of as much irrelevant noise as you can. The worst of the noise will generally be higher frequency than any possible human-induced acceleration.
Realise that when the device is not being accelerated by the user, the only force is due to gravity, and therefore you can deduce its attitude in space. Moreover, when the total acceleration varies greatly from 1g, it must be due to the user accelerating the device; by subtracting last known estimate of gravity, you can roughly estimate in what direction and by how much the user is accelerating the device, and so obtain data you can begin to match against a list of known gestures.
With a single three-axis accelerometer you can detect the current pitch and roll, and also acceleration of the device in a straight line. Integrating acceleration minus gravity will give you an estimate of current velocity, but the estimate will rapidly drift away from reality due to noise; you will have to make assumptions about the user's behaviour before / between / during gestures, and guide them through your UI, to provide points where the device is not being accelerated and you can reset your estimates and reliably estimate the direction of gravity. Integrating again to find position is unlikely to provide usable results over any useful length of time at all.
If you have two three-axis accelerometers some distance apart, or one and some gyros, you can also detect rotation of the device (by comparing the acceleration vectors, or from the gyros directly); integrating angular momentum over a couple of seconds will give you an estimate of current yaw relative to that when you started integrating, but again this will drift out of true rapidly.
Since no one seems to have mentioned existing libraries, as requested by OP, here goes:
http://www.wiigee.org/
Meant for use with the Wiimote, wiigee is an open-source Java based implementation for pattern matching based on accelerometer readings. It accomplishes this using Hidden Markov Models[1].
It was apparently used to great effect by a company, Thorn Technologies, and they've mentioned their experience here : http://www.thorntech.com/2013/07/mobile-device-3d-accelerometer-based-gesture-recognition/
Alternatively, you could consider FastDTW (https://code.google.com/p/fastdtw/). It's less accurate than regular DTW[2], but also computationally less expensive, which is a big deal when it comes to embedded systems or mobile devices.
[1] https://en.wikipedia.org/wiki/Hidden_Markov_model
[2] https://en.wikipedia.org/wiki/Dynamic_time_warping
EDIT: The OP has mentioned in one of the comments that he completed his project, with 90% accuracy in the field and a sub-millisecond compute time, using a variant of $1 Recognizer. He also mentions that rotation was not a criteria in his project.
What hasn't been mentioned yet is the actual gesture recognition. This is the hard part. After you have cleaned up your data (low pass filtered, normalized, etc) you still have most of the work to do.
Have a look at Hidden Markov Models. This seems to be the most popular approach, but using them isn't trivial. There is usually a preprocessing step. First doing STFT and clustering the resultant vector into a dictionary, then feeding that into a HMM. Have a look at jahmm in google code for a java lib.
Adding to moonshadow's point about having to reset your baseline for gravity and rotation...
Unless the device is expected to have stable moments of rest (where the only force acting on it is gravity) to reset its measurement baseline, your system will eventually develop an equivalent of vertigo.