I have a code for calculating euclidean distance like this :
for(int j=0;j<inputdimension;j++){
distance += Math.pow((vector1[j] - vector2.get(j)), 2);
}
and i've an example of data with different array size that i want to calculate with this euclidean distance function for those two array. for example :
vector1[] = {0.5,0.1,0.3,1.5}
vector2[] = {1.4,3.2,3.4,0.1,7.8,0.2,8.3,8.4}
So far i've encountered example for calculating the euclidean distance with the same array size. So i come up with a solution to remove the remaining array so they go balance like this for example :
vector1[] = {0.5,0.1,0.3,1.5}
vector2[] = {1.4,3.2,3.4,0.1}
The problem is, i dont know if its right or not? is there any other way to balance this data?
The answer is simple. You cannot find euclidean distance between two points which have different number of dimensions. If you eliminate remaining array like that, the answer would be unrealistic and inconsistent. However you can add zeros to smaller array, if this is a must for you. The result will be better, but it will be still unrealistic in my opinion.
P.S. You need to use Math.sqrt(distance) method after for loop.
Related
First of all I know what the Euclidean distance is and what it does or calculates between two vectors.
But my question is about how to calculate the distance between two class objects for example in Java or any other OOP-Language. I read pretty much stuff about machine learning already wrote a classifier using libraries etc. but I want to know how the Euclidean distance is calculated when I have for example this object:
class Object{
String name;
Color color;
int price;
int anotherProperty;
double something;
List<AnotherObject> another;
}
What I already know (If I am not wrong!) is that I have to convert this object to a(n) vector / array representing the properties or 'Features' (called in Machine Learning?)
But how can I do this? It is just this piece of puzzle which I need, to understand even more.
Do I have to collect all possible values for a property to convert it to a number and write it in the array/vector?
Example:
I guess the above object would be represented by an 6-dimensional array or smaller based on the 'Features' which are necessary to calculate.
Let's say Color, Name and the price are those necessary features the array/vector based on the following data:
color: green (Lets say an enum with 5 possible values where green is the third one)
name: "foo" (I would not know how to convert this one maybe using
addition of ascii code?)
price: 14 (Just take the integer?)
would look like this?
[3,324,14]
And if I do this with every Object from the same class I am able to calculate the Euclidean distance. Am I right or did I misunderstand something, or is it completely wrong?
For each data type you need to choose an appropriate method of determing the distance. In many cases each data type may also itself have to be treated as a vector.
For colour, for example, you could express the colour as an RGB value and then take the Euclidian distance (take the 3 differences, square them, sum and then square root). You might want to chose a different colour-space than RGB (e.g., HSI). See here: Colour Difference.
Comparing two strings is easier: a common method is the Levenshtein distance. There is an method in the Apache commons StringUtils class.
Numbers - just take the difference.
Every type will require some consideration for the best way of either generating a distance directly or calculating the a numeric value that can then be subtracted to give a "distance".
Once you have a vector of all of the "values" of all of the fields for each object you can calculate the Euclidian distance (square the differences, sum and square root the sum).
In your case, if you have:
object 1: [3,324,14]
object 2: [5,123,10]
The Euclidian distance is:
sqrt( (3-5)^2 + (324-123)^2 + (14-10)^2 )
But in the case of comparing strings, the Levenshtein algorithm gives you the distance directly without intermediate numbers for the fields.
Think about this problem as a statistics problem. Classify all the attributes into nominal, ordinal, and scale variables. Once you have done that, it is just a multiple dimension distance vector problem.
I am making my own implementation of a raycaster in a game I am making, and I have come across a very hard problem. I have a player (the black dot), and I need to find the intersection nearest to the player. In the image below, the arrow is pointing to the intersection point I need.
What I guess I am trying to say is that I need a function something like this:
// Each line would take in 2 map values for it's 2 points
// In turn, the map would have to have an even number of points
public Point getNearestIntersection(int playerX, int playerY, int lineDir, Point[] map) {
// whatever goes here
}
I am going to have to do this about 50 times every frame, with about 100 lines. I would like to get 40 fps at the least if possible... Even if I divide it up into threads I still feel that it would cause a lot of lag.
The class Point has a method called distance which calculates the distance of two points. You then could loop all points to get the nearest. Could be something like this:
Point currentNearestIntersection;
double smallestDistance;
for (Point inter : intersections) {
double distance = player.distance(inter );
if (distance < smallestDistance) {
currentNearestIntersection= inter;
smallestDistance = distance;
}
}
axis/line intersection is in reality solving:
p(t)=p0+dp*t
q(u)=q0+(q1-q0)*u
p(t)=q(u)
t=? u=?
where:
p0 is your ray start point (vector)
dp is ray direction (vector)
q0,q1 are line endpoints (vectors)
p(t),q(u) are points on axis,line
t,u are line parameters (scalars)
This is simple system of 2 linear equations (but in vectors) so it lead to N solutions where N is the dimensionality of the problem so choose the one that is not division by zero ... Valid result is if:
t>=0 and u=<0.0,1.0>
if you use unit dp vector for direction of your ray then from computing intersection between axis and line the t parameter is directly distance from the ray start point. So you can directly use that ...
if you need to speed up the intersections computation see
brute force line/line intersection with area subdivision
And instead of remebering all intersections store always the one with smallest but non negative t ...
[Notes]
if you got some lines as a grid then you can compute that even faster exploiting DDA algorithm and use real line/line intersection only for the iregular rest... nice example of this is Wolfenstein pseudo 3D raycaster problem like this
i am trying to work out the following problem in Java (although it could be done in pretty much any other language):
I am given two arrays of integer values, xs and ys, representing dataPoints on the x-axis. Their length might not be identical, though both are > 0, and they need not be sorted. What I want to calculate is the minimum distance measure between two data set of points. What I mean by that is, for each x I find the closest y in the set of ys and calculate distance, for instance (x-y)^2. For instance:
xs = [1,5]
ys = [10,4,2]
should return (1-2)^2 + (5-4)^2 + (5-10)^2
Distance measure is not important, it's the algorithm I am interested in. I was thinking about sorting both arrays and advance indexes in both of these arrays somehow to achieve something better than bruteforce (for each elem in x, scan all elems in ys to find min) which is O(len1 * len2).
This is my own problem I am working on, not a homework question. All your hints would be greatly appreciated.
I assume that HighPerformanceMark (first comment on your question) is right and you actually take the larger array, find for each element the closest one of the smaller array and sum up some f(dist) over those distances.
I would suggest your approach:
Sort both arrays
indexSmall=0
// sum up
for all elements e in bigArray {
// increase index as long as we get "closer"
while (dist(e,smallArray(indexSmall)) > dist(e,smallArray(indexSmall+1)) {
indexSmall++
}
sum += f(dist(e,smallArray(indexSmall)));
}
which is O(max(len1,len2)*log(max(len1,len2))) for the sorting. The rest is linear to the larger array length. Now dist(x,y) would be something like abs(x-y), and f(d)=d^2 or whatever you want.
You're proposed idea sounds good to me. You can sort the lists in O(n logn) time. Then you can do a single iteration over the longer list using a sliding index on the other to find the "pairs". As you progress through the longer list, you will never have to backtrack on the other. So now your whole algorithm is O(n logn + n) = O(n logn).
Your approach is pretty good and has O(n1*log(n1)+n2*log(n2)) time complexity.
If the arrays has different lengths, another approach is to:
sort the shorter array;
traverse the longer array from start to finish, using binary search to locate the nearest item in the sorted short array.
This has O((n1+n2)*log(n1)) time complexity, where n1 is the length of the shorter array.
If I have an object with properties of x an y, how can I tell which point in an array is the closest without using the distance formula?
You can't get an accurate result without using some variant of the distance formula. But you can save a few cycles by not taking the square root after the fact; the comparison will remain valid.
r = dx2 + dy2
If you don't care about the exact distance, you could perhaps take the difference between the x and y coordinates of your source and destination points to provide you with some ordering.
//The following code does not return the closest point,
//but it somewhat does what you need and complies with
//your requirement to not use the distance formula
//it finds the sum of x and y displacements
Point destination=...
Point nearestPoint= points.get(0);
for (Point p : points){
closenessCoefficient= Math.abs(destination.x-p.x) + Math.abs(a.destination-p.y);
nearestPoint=Math.Min(closenessCoefficient, nearestPoint);
}
return nearestPoint;
If you have to find exactly the closest neighbour, there is no way around evaluating the distance formula, at least for a couple of points. As already pointed out, you can avoid evaluating the expensive sqrt for most of the time when simply comparing the distance-squared r^2 = x^2 + y^2.
However, if you have a big number of points spread out over a large range of distances, you can first use an approximation like the ones shown here http://www.flipcode.com/archives/Fast_Approximate_Distance_Functions.shtml . Then you can calculate the real distance formula only for the points closest as given by the approximation. On architectures where also the multiplication is expensive, this can make a big difference. On modern x86/x86-64 architectures this should not matter much though.
currently i have using a framework and it has a function called distance2D, and it has this description:
Calculate the Euclidean distance
between two points (considering a
point as a vector object). Disregards
the Z component of the vectors and is
thus a little faster.
and this is how i use it
if(g.getCenterPointGlobal().distance2D(target.getCenterPointGlobal()) > 1)
System.out.println("Near");
i have totally no idea what a Euclidean distance is, i am thinking that it can be used to calculate how far 2 points are? because i am trying to compare distance between 2 objects and if they are near within a certain range i want to do something. how would i be able to use this?
Euclidean distance is the distance between 2 points as if you were using a ruler. I don't know what are the dimensions of your Euclidean space, but be careful because the function you are using just takes in consideration the first two dimensions (x,y). Thus if you have a space with 3 dimensions(x,y,z) it will only use the first two(x,y of x,y,z) to calculate the distance. This may give a wrong result.
For what I understood, if you want to trigger some action when two points are within some range you should make:
<!-- language: lang-java -->
if(g.getCenterPointGlobal().distance2D(target.getCenterPointGlobal()) < RANGE)
System.out.println("Near");
The Euclidean distance is calculated tracing a straight line between two points and measuring as the hypotenuse of a imaginary isosceles triangle between the two lines and a complementary point. This measure is scalar, so it's a good metric for your calculations.
Euclidean geometry is a coordinate system in which space is flat, not curved. You don't need to care about non-Euclidean geometry unless for example you're dealing with coordinates mapped onto a sphere, such as calculating the shortest travel distance between two places on Earth.
I imagine this function will basically use Pythagoras' theorem to calculate the distance between the two objects. However, as the description says, it disregards the Z component. In otherwords, it will only give a correct answer if both points have the same Z value (aka "depth").
If you wish to compare distances and save time, use not the distance itself, but its square: (x1-x2)^2 + (y1-y2)^2. Don't take sqrt. So, your distances will work exactly as euclidian ones, but quickly.