Java: Calculate distance between a large number of locations and performance

Java: Calculate distance between a large number of locations and performance - java

I'm creating an application that will tell a user how far away a large number of points are from their current position.
Each point has a longitude and latitude.
I've read over this article
http://www.movable-type.co.uk/scripts/latlong.html
and seen this post
Calculate distance in meters when you know longitude and latitude in java
There are a number of calculations (50-200) that need carried about.
If speed is more important than the accuracy of these calculations, which one is best?

this is O(n)
Dont worry about performance. unless every single calculation takes too long (which it isnt).

As Imre said this is O(n), or linear, meaning that no matter how the values differ or how many times you do it the calculations in the algorithm will take the same amount of time for each iteration. However, I disagree in the context that the Spherical Law of Cosines has less actual variables and calculations being performed in the algorithm meaning that less resources are being used. Hence, I would choose that one because the only thing that will differ speed would be the computer resources available. (note: although it will be barely noticable unless on a really old/slow machine)
Verdict based on opinion: Spherical Law of Cosines

The two links that you posted use the same spherical geometry formula to calculate the distances, so I would not expect there to be a significant difference between their running speed. Also, they are not really computationally expensive, so I would not expect it to be a problem, even on the scale of a few hundred iterations, if you are running on modern hardware.

Related

Actual performance benefits of distance squared vs distance

When calculating the distance between two 3D points in Java, I can compute the distance, or the distance squared between them, avoiding a call to Math.sqrt.
Natively, I've read that sqrt is only a quarter of the speed of multiplication which makes the inconvenience of using the distance squared not worthwhile.
In Java, what is the absolute performance difference between multiplication and calculating a square root?

I Initially wanted to add this as a comment, but it started to get too bid, so here goes:
Try it yourself. Make a loop with 10.000 iterations where you simply calculate a*a + b*b and another separate loop where you calculate Math.sqrt(a*a + b*a). Time it and you'll know. Calculating a square root is an iterative process on its own where the digital (computer bits) square root converges closer to the real square root of the given number until it's sufficiently close (as soon as the difference between each iteration is less than some really small value). There are multiple algorithms out there beside the one the Math library uses and their speed depends on the input and how the algorithm is designed. Stick with Math.sqrt(...) in my opinion, can't go wrong and it's been tested by a LOT of people.
Although this can be done very fast for one square root, there's a definite observable time difference.
On a side note: I cannot think of a reason to calculate the square root more than once, usually at the end. If you want to know the distance between points, just use the squared value of that distance as a default and make comparisons/summations/subtractions or whatever you want based on that default.
PS: Provide more code if you want a more "practical" answer

Using rounding to make floating point operations deterministic

I have implemented a lockstep model using a simulation that has to be determinstic. For exact positions i use floats. Now i face the problem of them not being deterministic on every hardware / os. The simulation depends on vector maths with and scaling a few vectors each tick but also calculating exponential values.
Now I'm curious if it would be enough if I would round the floats to 4 places after the decimal point to achieve determinism, cause i only apply 5-10 operations to each float each tick.

First using doubles would already decrease approximation errors a bit further. Then rounding might be just sufficiently deterministic.
Also use strictfp which does just what you intend to do.

Velocity verlet algorithm not conserving energy

I was under the impression that the algorithm should conserve energy if the system being modelled does. I'm modelling the solar system, which should conserve energy. The program conserves angular momentum and does produce stable orbits, but the total energy (kinetic + gravitational potential) oscillates around some baseline. The oscillations are significant. Are there common reasons why this might happen?
Model assumes planets are point masses, circular orbits (I've also tried elliptical orbits and the energy still oscillates) and uses Newtonian mechanics. I can't think what other features of the program might be affecting the outcome.
If it is just expected that the energy oscillates, what causes that??

Look up the Verlet-Störmer paper by Hairer et al. (Geometric numerical integration illustrated by the Störmer/Verlet method). There should be several sources online.
In short, a symplectic integrator preserves a Hamiltonian and thus energy, but it is a modified Hamiltonian. If the method is correctly initialized, the modification is a perturbation of order O(h²), where h is the step size. Incorrect initialization gives a perturbation of O(h), while the observed oscillation should still have an amplitude of O(h²).
Thus the observed pattern of an oscillating energy as computed by the physical formulas is completely normal and expected. An error would be observed if the energy were to (rapidly) deviate from this relatively stable pattern.
An easy, but slightly inefficient method to get from the order 2 Verlet method an order 4 symplectic integrator is to replace
Verlet(h)
by
Verlet4(h) {
Verlet(b0*h);
Verlet(-b1*h);
Verlet(b0*h);
}
where b0=1/(2-2^(1/3))=1.35120719196… and b1=2*b0-1=1.70241438392…. This is called a "composition method".

Merged from the comments:
For a full gravitational N-body problem, I don't think any numerical integrator will be symplectic. Velocity Verlet isn't symplectic even for a single point orbiting a center (easy to check, since it has a trivial analytical solution with g = v^2/R). So I suggest trying a higher-order integrator (such as Runge-Kutta), and if energy deviations almost go away (meaning the the calculations are generally correct), you can re-scale the combined kinetic energy to keep the total energy conserved explicitly. Specifically, you compute the updated Ekin_actual and Ekin_desired = Etotal_initial - Epotential, and scale all velocities by sqrt(Ekin_desired / Ekin_actual)

How do I select statistically significant points from the set of points?

Server is receiving a certain rate(12 per minute) of monitoring data for some process via external source(web services, etc). Now process may run for a minute(or less than) or for an hour or a day. At the end of the process, I may be having 5 or 720 or 17280 data points. This data is being gathered for more than 40 parameters and stored into the database for future display via web. Imagine more than 1000 processes are running and the amount of data generated. I have to stick to RDBMS(MySQL specifically). Therefore, I want to process the data and decrease the amount the data by selecting only statistically significant points before storing the data to the database. The ultimate objective is to plot these data points over a graph where Y-axis will be time and X-axis will be represented by some parameter(part of data point).
I do not want to miss any significant fluctuation or nature but at the same time I cannot manage to plot all of the data points(in case the number is huge > 100).
Please note that I am aware of basic statistical terms like mean, standard deviation, etc.

If this is a constant process, you could plot the mean (should be a flat line) and any points that exceeded a certain threshold. Three standard deviations might be a good threshold to start with, then see whether it gives you the information you need.
If it's not a constant process, you need to figure out how it should be varying with time and do a similar thing: plot the points that substantially vary from your expectation at that point in time.
That should give you a pretty clean graph while still communicating the important information.

If you expect your process to be noisy, then doing some smoothing through a spline can help you reduce noise and compress your data (since to draw a spline you need only a few points, where "few" is arbitrary picked by you, depending on how much detail you want to get rid of).
However, if your process is not noisy, then outliers are very important, since they may represent errors or exceptional conditions. In this case, you are better off getting rid of the points that are close to the average (say less than 1 standard deviation), and keeping those that are far.
A little note: the term "statistically significant", describes a high enough level of certainty to discard the null hypothesis. I don't think it applies to your problem.

Fast multi-body gravity algorithm?

I am writing a program to simulate an n-body gravity system, whose precision is arbitrarily good depending on how small a step of "time" I take between each step. Right now, it runs very quickly for up to 500 bodies, but after that it gets very slow since it has to run through an algorithm determining the force applied between each pair of bodies for every iteration. This is of complexity n(n+1)/2 = O(n^2), so it's not surprising that it gets very bad very quickly. I guess the most costly operation is that I determine the distance between each pair by taking a square root. So, in pseudo code, this is how my algorithm currently runs:
for (i = 1 to number of bodies - 1) {
for (j = i to number of bodies) {
(determining the force between the two objects i and j,
whose most costly operation is a square root)
}
}
So, is there any way I can optimize this? Any fancy algorithms to reuse the distances used in past iterations with fast modification? Are there any lossy ways to reduce this problem? Perhaps by ignoring the relationships between objects whose x or y coordinates (it's in 2 dimensions) exceed a certain amount, as determined by the product of their masses? Sorry if it sounds like I'm rambling, but is there anything I could do to make this faster? I would prefer to keep it arbitrarily precise, but if there are solutions that can reduce the complexity of this problem at the cost of a bit of precision, I'd be interested to hear it.
Thanks.

Take a look at this question. You can divide your objects into a grid, and use the fact that many faraway objects can be treated as a single object for a good approximation. The mass of a cell is equal to the sum of the masses of the objects it contains. The centre of mass of a cell can be treated as the centre of the cell itself, or more accurately the barycenter of the objects it contains. In the average case, I think this gives you O(n log n) performance, rather than O(n2), because you still need to calculate the force of gravity on each of n objects, but each object only interacts individually with those nearby.
Assuming you’re calculating the distance with r2 = x2 + y2, and then calculating the force with F = Gm1m2 / r2, you don’t need to perform a square root at all. If you do need the actual distance, you can use a fast inverse square root. You could also used fixed-point arithmetic.

One good lossy approach would be to run a clustering algorithm to cluster the bodies together.
There are some clustering algorithms that are fairly fast, and the trick will be to not run the clustering algorithm every tick. Instead run it every C ticks (C>1).
Then for each cluster, calculate the forces between all bodies in the cluster, and then for each cluster calculate the forces between the clusters.
This will be lossy but I think it is a good approach.
You'll have to fiddle with:
which clustering algorithm to use: Some are faster, some are more accurate. Some are deterministic, some are not.
how often to run the clustering algorithm: running it less will be faster, running it more will be more accurate.
how small/large to make the clusters: most clustering algorithms allow you some input on the size of the clusters. The larger you allow the clusters to be, the faster but less accurate the output will be.
So it's going to be a game of speed vs accuracy, but at least this way you will be able to sacrafice a bit of accuracy for some speed gains - with your current approach there's nothing you can really tweak at all.

You may want to try a less precise version of square root. You probably don't need a full double precision. Especially if the order of magnitude of your coordinate system is normally the same, then you can use a truncated taylor series to estimate the square root operation pretty quickly without giving up too much in efficiency.

There is a very good approximation to the n-body problem that is much faster (O(n log n) vs O(n²) for the naive algorithm) called Barnes Hut. Space is subdivided into a hierarchical grid, and when computing force contribution for distant masses, several masses can be considered as one. There is an accuracy parameter that can be tweaked depending on how much your willing to sacrifice accuracy for computation speed.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Java: Calculate distance between a large number of locations and performance - java

this is O(n) Dont worry about performance. unless every single calculation takes too long (which it isnt).

Related

Actual performance benefits of distance squared vs distance

Using rounding to make floating point operations deterministic

Velocity verlet algorithm not conserving energy

How do I select statistically significant points from the set of points?

Fast multi-body gravity algorithm?

Categories

Resources