Interpolating between Functions [duplicate] - java

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Interpolation over an array (or two)
I have a set of CSV files that contain points of a 2D function... in other words I have four CSV files, each is the result of evaluating a function f(x, y) at different y values. I need to interpolate between these data such that I can calculate an arbitrary f for a certain x and y. The CSV files have varying lengths and x-values. Does anyone know of a library or algorithm in Java for this task? Linear interpolation is OK, as is spline interpolation.
Thanks,
taktoa

Ok, first of all I assume the "CSV" bit is irrelevant, let's assume you have read those into memory and merged them together (they're the values of the same function, right?). Now you have a single set of f(x,y) values for different (x,y) pairs and would like to interpolate between those. Fine so far?
If you stick to linear interpolation, there's still the question of how many points to take into account, which will depend on the level of noise in the measurements. In the simplest case one would use just the three nearest points to identify the plane they lie in and use that to find the value for the point in question. This option requires neither libraries nor algorithms, apart from vector addition, subtraction, cross product and dot product.
More sophisticated solutions would generally require some sort of fitting, e.g. (weighted) least squares.

The simplest function is to find the closest points and use linear interpolation. e.g. chose two of three closest points and interpolate them.
Or you can take a weighted average based on distance. Or you can pick a close point and then find points on the "other side" of the closest point to improve the interpolation.

Lagrange interpolation would be simple and accurate.

Related

Box containing as much given points as possible in a coordinate system

I have a question:
It's may a bit more a mathematical question, but in informatical context.
I have a number of n points on a coordinate system. (They are randomly placed.)
My problem is, to find a box which has a diameter d and (theoretically) infinite length (like a big line) that contains as much of the n points as possible.
Any idea?
I finally want to write a programm in Java based on the answers, but a already written one is welcome, too ;)

Given a millions of points, finds points that lie on the line or in the range of 0.2 mm distance from the line [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
Given a set of millions of points with (x,y) co-ordinates, I have to selects points which are on the line[(x1,y1)-(x2,y2)] or are in the range(distance) of 0.2 mm from the line.
One way to solve this
1) put every point in the equation of line, if it satisfies then point lies on the line
2) calculate perpendicular distance between point and line , if distance is less than 0.2 mm then select this point
but for millions of point it will not be best solution, so i am looking for new algorithm or techniques with which we can solve above problem.
Any help will be highly appreciated! thank you.
Thanks for your reply, I have gone through all the suggestions posted by you. I think, I would have given you a some background of the problem.
I am new to c++ programming and description of the problem that i am facing is as follow
input to the program is a comma separated text file that contains millions or billions of x & y co-ordinates with some value at each point
e.g. x1, y1, value1
then program draws thousands of lines ([x1,y1]-[x2,y2])
and for the each line i have to select a set of points that lie on the line or are in the range of 0 to 0.2 mm
I tried following methods:
calculated distance of the each points from the line , if distance <= 0.2 then select such point, and did it for thousands of line...but it is not efficient algorithm
in second method i plan to sort co-ordinates, and then i draw parallel lines on both side of given line at a distance of 0.2 mm.... but don't know how to identify the points that lie between the parallel lines. also suggest if this method is good
some of you suggested to use r trees, 2-variant method, since i am new to programming pl suggest some online tutorial to understand & how to implement the same
To compare all the points pretty much all you can do is compare all the points.
You can split the task over multiple threads to get it processed faster but I think you are underestimating the speed of computers.
Do the simple implementation first and see if it is fast enough before trying to complicate things.
How about rotating the plane so the line becomes an axis, say the x-axis? You don't need to apply the entire rotation matrix to every point either. You just need to check the y-coordinate of the rotated point, and in fact, you probably don't need to do the whole computation most of the time either, since you can do some simple tests to see if you'll be within the needed distance from the x-axis.
R-Tree sounds like what you need.
This data structure allows you to query your data base for what point lies within the given box (in your case a line with width 0.2).
So the change you will need is the rotation of you points, now I'm not sure the data structure will work if you just treat you point as rotated, but even then you problem became much simpler and it's to rotate your points, then construct the tree and query.
Hope it helps.
To estimate time needed to solve the problem above let consider the following python code:
import numpy as np
x1=1.1
x2=1.0
y1=8.0
y2=3.2
#preliminary computation to obtain canonical equation of the line (x1,y1)---(x2,y2)
d=(x2-x1)**(-2.0)+(y1-y2)**(-2)
g=lambda (x,y):1/(x2-x1)/np.sqrt(d)*x+y/(y1-y2)/np.sqrt(d)-(x1/(x2-x1)+y1/(y1-y2))/np.sqrt(d)
# Now: g((3,4)) computes the distance between the line and the point (3,4)
np.random.rand(10**6,2) #generate 1 mil. of points
result=map(lambda x: np.abs(g(x))<=0.2, points) #computes all cases when distance from current point to the line is less 0.2
Computation time: approx. 15 sec, Athlon x4, 2Ghz, Python 2.7
you can also do some some preliminary filtering of all the points: consider the circle centered at ((x1+x2)/2, (y1+y2)/2 ) and R = length_of_the_line/2+offset (offset = 0.2mm).
If the point is inside the circle compute the perpendicular distance between point and line, if distance is less than 0.2 mm then select this point.
If the point is outside the circle don't select it
This should save you some cycles since it's easier to check if a point is inside a circle than to compute the perpendicular distance for every point.
Hope it helps.

Algorithm for clustering Tweets based on geo radius

I want to cluster tweets based on a specified geo-radius like (10 meters). If for example I specify 10 meters radius, then I want all tweets that are within 10 meters to be in one cluster.
A simple algorithm could be to calculate the distance between each tweet and each other tweets, but that would be very computationally expensive. Are there better algorithms to do this?
You can organize your tweets in a quadtree. This makes it quite easy to find tweets near by without looking at all tweeds and their location.
The quadtree does not directly deliver the distance (because it is based on a Manhatten-distance but it gives you near by tweets, for which you can calculate the precise distance afterwards afterwards.
If your problem is only in computation of distances:
remember: you should never count distances if you need them for comparison only. Use their squares instead.
Do not compare:
sqrt((x1-x2)^2+(y1-y2)^2) against 10
compare instead
(x1-x2)^2+(y1-y2)^2 against 100
It takes GREATLY less time.
The other improvement can be reached if you simply compare coordinates before comparing squares of distances. If abs(x1-x2)>1, you needn't that pair anymore. (It is the Manchattan distance MrSmith is speaking about)
I don't know how you work with your points, but if their set is stable, you could make two arrays of them, and in each one order them according to one of the coordinates. After that you need to check only these points that are close to the source one in both arrays.

How would I cluster an unordered list of locations? [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Clustering Algorithm for Mapping Application
I have an unordered List of locations (containing their co-ordinates). I know to use the Haversine formula to calculate the distance between two points. But solutions for clustering I've looked at say I'd need to order the list first. What is the correct ordering for locations?
I want to cluster (i.e. put all locations into a single clusteredLocation object) all locations which are within 1 metre of each other, is this feasible without sorting first?
Actually none of the cluster-analysis algorithms I know requires the points to be ordered. That would somewhat defeat the whole purpose of cluster analysis. But maybe you are more thinking of web2.0 marker-clusterer kind of aggregation?
Have a look at k-means, single-link and DBSCAN. All well described on Wikipedia, with Hub article Cluster Analysis. None of these require your points to be ordered.
Note that Haversine distance is not appropriate for k-means or average-linkage clustering, unless you find a smart way of computing the mean that minimizes variance. Do not use the arithmetic average if you have the -180/+180 wrap-around of latitude-longitude coordinates.
Single-linkage, complete-linkage, DBSCAN, OPTICS all should be fine.

Keyword based nearest neighbour algorithm or library

I want to find a library or an algorithm (so I write the code myself) for identifying the nearest k neighbours of a webpage, where the webpage is defined as being a set of keywords. I have already done the part where I extract the keywords.
It doesn't have to be very good, just good enough.
Can anyone suggest a solution, or where to start. I have looked through lectures by Yury Lifshits in the past, but I am hoping to get something ready-made if possible.
Java libraries preferred.
As you said, you already have the keywords extracted from a page. I am assuming that you represent each document/page by a vector of words. Something like a document term-frequency matrix.
I guess the nearest neighbour of a page is ideally a page with similar contents. So you'd like to find documents where the relative frequency of each word is similar to the one you are searching for. So first normalize the doc-term matrix WRT each row; i.e. replace the occurrence count by %tage occurrence.
Next you have to assign some distance between 2 documents represented by these vectors. You can use the normal Euclidean distance or Manhattan Distance. However for text document the similarity measure that usually works best is Cosine Similarity. Use whatever distance or similarity function suits your problem (remember for nearest neighbour you want to minimize the distance; but maximize similarity).
Once you have the vectors and your distance function in place, run the Nearest neighbour or the K-Nearest neighbour algorithm.

Categories

Resources