Nearest Neighbour using KDtree

Nearest Neighbour using KDtree - java

I know how to construct a kd tree .But the problem that i am facing is how to find nearest neighbour using KD Tree.I have searched on google but not able to find code for finding nearest neighbour,though algos are given . But I am facing difficulty in converting that algo into code because of Language.
Can you please provide me understandable code for NNSearch in java?

Here is pseudocode that assumes the target point is not stored in the tree. (If it is, just add logic to ignore it):
nearest_point = NULL
nearest_distance = INFINITE;
target_point = <set to the target point>
void nn_search(KD_NODE node) {
FLOAT d = node->point.distance_to(target_point);
if (d < nearest_distance) {
nearest_distance = d;
nearest_point = node->point;
}
BOX left_bb = node->left.bounding_box();
BOX right_bb = node->right.bounding_box();
if (left_bb.contains(target)) {
search_children(node->left, node->right, right_bb);
else { // right_bb must contain target
search_children(node->right, node->left, left_bb);
}
}
void search_children(KD_NODE a, KD_NODE b, BOX b_bb) {
nn_search(a);
// This condition makes the search expected O(log n) time rather than O(n).
// Skip searching the other child unless it might improve the answer.
if (b_bb.contains_point_closer_than(target, nearest_distance)) {
nn_search(b);
}
}
After this has run, nearest_point contains the nearest point to the target. Note that it's simple to compute the bounding boxes as parameters of nn_search rather than storing them inside the nodes, which this code appears to do. In production code you'd want to do that to save the space of 4 floats per node. I've omitted the parameters for simplicity.
The predicate contains_point_closer_than returns true if there exists any point in the bounding box that's closer to the target than the given distance. Happily it's enough to consider only one point in the box. E.g if the current node splits the search space into left and right halves at X, then you only need to consider the point (X, Y_target) and its distance to the target. That distance is just abs(X - X_target)! I'll let you convince yourself of this without further discussion

I know two Java kd-tree implementations that support kNN search, here and here. Theirs performance appears to be roughly equivalent.

Related

How to make my path-finding algorithm not go in reverse?

My path-finding method is given two objects containing an id, name, x/y coordinates, and path stored in an Array List. The path data is the id of each object that it can directly connect to. The objective is to call my method recursively until it finds its goal using the shortest distance, and when it reaches the end it returns true.
The problem:
If the distance to the node that you came from is shorter than the other nodes in the current nodes path, then it will cause an infinite loop bouncing back and forth between the two nodes. I have struggled with this problem for several hours and could be over thinking it. Any advice or suggestions will be greatly appreciated!
The Algorithm:
while (!pathfound) {
current = findPath(current, end);
}
public static Place findPath(Place curPlace, Place endPlace) {
ArrayList<Integer> path = curPlace.path;
int id;
double lastdist = 999;
double distance;
Place bestPlace = null;
for (int i = 0; i < path.size(); i++) {
id = curPlace.path.get(i);
distance = distance(getPlace(id), curPlace)
+ distance(getPlace(id), endPlace);
if (distance < lastdist) {
bestPlace = getPlace(id);
}
lastdist = distance;
}
if (result.length() == 0) {
result += bestPlace.name;
} else {
result += ", " + bestPlace.name;
}
System.out.println("CURCITY: " + bestPlace.id);
System.out.println(result);
System.out.println(lastdist);
if (bestPlace == endPlace) {
pathfound = true;
}
return bestPlace;
}
You can ignore result, it is for keeping up with the nodes that are passed through. If there are any other details you would like to know, please ask.

If it is acceptable to modify Place you can add a boolean "visited" flag. Reset them all to false prior to running the algorithm; set to true when you visit and false when you leave (don't forget to unset them on the way out of the recursion - if you do this properly you can even avoid having to explicitly reset the flags before starting). Skip nodes where the flag is true.
A more short-sighted option is to pass the last visited Place as a parameter to the function, and skip that one. This will not prevent larger loops but may be entirely appropriate for your situation and is the simplest to implement.
Both of the above are O(1) with minimal overhead. If you cannot modify Place you could store a Set of visited places (remove them from the set on the way out of recursion), and skip places that are already in that set. Depending on your performance requirements, if you use a HashSet you will want to come up with an appropriate hashing function.
Along those lines, at the expense of more memory, if your ID numbers are unique and cover a reasonably sized finite range, a boolean[] indexed by ID number is a constant time alternative to a set here (it is essentially the "visited" flag option with the flags stored externally).

Using a recursive approach to path finding algorithm can be quite tricky, as you always need some kind of global information to evaluate, which one of two paths is more suitable. While following a single path, you can never be sure, if it is the right one. Even if you always follow the nearest node, it doesn't have to be the right path. This is called best-first search strategy and although it is not the best, it can be made to work, but you have to make sure to try other paths as well, because you can't pull it off by simply always sticking to the closest node.
If you want to do a path finding algorithm, you will need to keep track of all the nodes, that you have already explored exhaustively and therefore will never need to visit again. This can be done either explicitly, by storing the list of visited nodes in a structure of some kind, or you can be smarter about it and enforce this by good design of policy for selecting a new node to be visited.
In other words, if you keep track of the nodes to be visited along with the distances to each node (priority queue) and you always make sure to visit the nearest not-yet-visited node, you will never revisit the same nodes again, without having to explicitly enforce it, such as in A* algorithm, or Dijkstra.

Finding Rectangle which contains a Point

In Java SE 7, I'm trying to solve a problem where I have a series of Rectangles. Through some user interaction, I get a Point. What I need to do is find the (first) Rectangle which contains the Point (if any).
Currently, I'm doing this via the very naieve solution of just storing the Rectangles in an ArrayList, and searching for the containing Rectangle by iterating over the list and using contains(). The problem is that, because this needs to be interactive for the user, this technique starts to be too slow for even a relatively small number of Rectangles (say, 200).
My current code looks something like this:
// Given rects is an ArrayList<Rectangle>, and p is a Point:
for(Rectangle r : rects)
{
if(r.contains(p))
{
return r;
}
}
return null;
Is there a more clever way to solve this problem (namely, in O(log n) instead of O(n), and/or with fewer calls to contains() by eliminating obviously bad candidates early)?

Yes, there is. Build 2 interval trees which will tell you if there is a rectangle between x1 to x2 and between y1 and y2. Then, when you have the co-ordinates of the point, perform O(log n) searches in both the trees.
That'll tell you if there are possibly rectangles around the point of interest. You still need to check if there is a common rectangle given by the two trees.

Simple A star algorithm Tower Defense Path Trapped

So first of all I'm in a 100 level CS college class that uses Java. Our assignment is to make a tower defense game and I am having trouble with the pathing. I found from searching that A* seems to be the best for this. Though my pathing get's stuck when I put a U around the path. I'll show some beginner psuedo code since I haven't taken a data structures class yet and my code looks pretty messy(working on that).
Assume that I will not be using diagonals.
while(Castle not reached){
new OpenList
if(up, down, left, right == passable && isn't previous node){
//Adds in alternating order to create a more diagonal like path
Openlist.add(passable nodes)
}
BestPath.add(FindLeasDistancetoEnd(OpenList));
CheckCastleReached(BestPath[Last Index]);
{
private node FindLeastDistancetoEnd(node n){
return first node with Calculated smallest (X + Y to EndPoint)
}
I've stripped A* down(too much, my problem most likely). So I'm adding parents to my nodes and calculating the correct parent though I don't believe this will solve my problem. Here's a visual of my issue.
X = impassable(Towers)
O = OpenList
b = ClosedList(BestPath)
C = Castle(EndPoint)
S = Start
OOOOXX
SbbbBX C
OOOOXX
Now the capitol B is where my issue is. When the towers are placed in that configuration and my Nav Path is recalculated it gets stuck. Nothing is put into the OpenList since the previous node is ignored and the rest are impassable.
Writing it out now I suppose I could make B impassable and backtrack... Lol. Though I'm starting to do a lot of what my professor calls "hacking the code" where I keep adding patches to fix issues, because I don't want to erase my "baby" and start over. Although I am open to redoing it, looking at how messy and unorganized some of my code is bothers me, can't wait to take data structures.
Any advice would be appreciated.

Yes, data structures would help you a lot on this sort of problem. I'll try to explain how A* works and give some better Pseudocode afterwards.
A* is a Best-First search algorithm. This means that it's supposed to guess which options are best, and try to explore those first. This requires you to keep track of a list of options, typically called the "Front" (as in front-line). It doesn't keep track of a path found so far, like in your current algorithm. The algorithm works in two phases...
Phase 1
Basically, you start from the starting position S, and all the neighbouring positions (north, west, south and east) will be in the Front. The algorithm then finds the most promising of the options in the Front (let's call it P), and expands on that. The position P is removed from the Front, but all of its neighbours are added in stead. Well, not all of its neighbours; only the neighbours that are actual options to go. We can't go walking into a tower, and we wouldn't want to go back to a place we've seen before. From the new Front, the most promising option is chosen, and so on. When the most promising option is the goal C, the algorithm stops and enters phase 2.
Normally, the most promising option would be the one that is closest to the goal, as the crow flies (ignoring obstacles). So normally, it would always explore the one that is closest to the goal first. This causes the algorithm to walk towards the goal in a sort-of straight line. However, if that line is blocked by some obstacle, the positions of the obstacle should not be added to the Front. They are not viable options. So in the next round then, some other position in the Front would be selected as the best option, and the search continues from there. That is how it gets out of dead ends like the one in your example. Take a look at this illustration to get what I mean: https://upload.wikimedia.org/wikipedia/commons/5/5d/Astar_progress_animation.gif The Front is the hollow blue dots, and they mark dots where they've already been in a shade from red to green, and impassable places with thick blue dots.
In phase 2, we will need some extra information to help us find the shortest path back when we found the goal. For this, we store in every position the position we came from. If the algorithm works, the position we came from necessarily is closer to S than any other neighbour. Take a look at the pseudocode below if you don't get what I mean.
Phase 2
When the castle C is found, the next step is to find your way back to the start, gathering what was the best path. In phase 1, we stored the position we came from in every position that we explored. We know that this position must always be closer to S (not ignoring obstacles). The task in phase 2 is thus very simple: Follow the way back to the position we came from, every time, and keep track of these positions in a list. At the end, you'll have a list that forms the shortest path from C to S. Then you simply need to reverse this list and you have your answer.
I'll give some pseudocode to explain it. There are plenty of real code examples (in Java too) on the internet. This pseudocode assumes you use a 2D array to represent the grid. An alternative would be to have Node objects, which is simpler to understand in Pseudocode but harder to program and I suspect you'd use a 2D array anyway.
//Phase 1
origins = new array[gridLength][gridWidth]; //Keeps track of 'where we came from'.
front = new Set(); //Empty set. You could use an array for this.
front.add(all neighbours of S);
while(true) { //This keeps on looping forever, unless it hits the "break" statement below.
best = findBestOption(front);
front.remove(best);
for(neighbour in (best's neighbours)) {
if(neighbour is not a tower and origins[neighbour x][neighbour y] == null) { //Not a tower, and not a position that we explored before.
front.add(neighbour);
origins[neighbour x][neighbour y] = best;
}
}
if(best == S) {
break; //Stops the loop. Ends phase 1.
}
}
//Phase 2
bestPath = new List(); //You should probably use Java's ArrayList class for this if you're allowed to do that. Otherwise select an array size that you know is large enough.
currentPosition = C; //Start at the endpoint.
bestPath.add(C);
while(currentPosition != S) { //Until we're back at the start.
currentPosition = origins[currentPosition.x][currentPosition.y];
bestPath.add(currentPosition);
}
bestPath.reverse();
And for the findBestOption method in that pseudocode:
findBestOption(front) {
bestPosition = null;
distanceOfBestPosition = Float.MAX_VALUE; //Some very high number to start with.
for(position in front) {
distance = Math.sqrt(position.x * position.x - C.x * C.x + position.y * position.y - C.y * C.y); //Euclidean distance (Pythagoras Theorem). This does the diagonal thing for you.
if(distance < distanceOfBestPosition) {
distanceOfBestPosition = distance;
bestPosition = position;
}
}
}
I hope this helps. Feel free to ask on!

Implement the A* algorithm properly. See: http://en.wikipedia.org/wiki/A%2A_search_algorithm
On every iteration, you need to:
sort the open nodes into heuristic order,
pick the best;
-- check if you have reached the goal, and potentially terminate if so;
mark it as 'closed' now, since it will be fully explored from.
explore all neighbors from it (by adding to the open nodes map/ or list, if not already closed).
Based on the ASCII diagram you posted, it's not absolutely clear that the height of the board is more than 3 & that there actually is a path around -- but let's assume there is.
The proper A* algorithm doesn't "get stuck" -- when the open list is empty, no path exists & it terminates returning a no path null.
I suspect you may not be closing the open nodes (this should be done as you start processing them), or may not be processing all open nodes on every iteration.
Use a Map<GridPosition, AStarNode> will help performance in checking for all those neighboring positions, whether they are in the open or closed sets/lists.

Java TSP Permute Points

Doing a project for school where we implement the Nearest Neighbor heuristic (which I have already done), and the Traveling Salesperson Problem where we do an exhaustive search (we then analyze the algorithms, their time complexity, etc). Our teacher said to look around for code to use (or modify) for the exhaustive search part instead of programming the whole thing as in the Nearest Neighbor portion. I have looked around, put only found stuff that does not pertain to how we were instructed to do our program.
As opposed to the typical problem where you use integers, we are using points (x, y).
My goal would be to calculate the shortest permutation and be able to know what that permutation was. So I'm thinking to have an array of array's (which contains the permutations).
If someone could help me out with the exhaustive search that would be nice.
Here is some excerpts from my code (member variables, function to calculate distance between two points, and where all the points are stored):
private int x;
private int y;
private boolean visited;
public double dist( point pt ){
int xdist = this.getX() - pt.getX();
int ydist = this.getY() - pt.getY();
double xsr = xdist*xdist;
double ysr = ydist*ydist;
return Math.sqrt( xsr + ysr );
}
point[] points = new point[n];
Any help is greatly appreciated.

A single TSP possible solution is essentially just an array of cities which represents the order in which to visit them, without the starting city.
So, presume n (the number of cities) = 5. Then a single possible solution is represented as an array of length 4. Now, how many ways can you order the cities [B, C, D, E]?
BCDE, BCED, BDCE, BDEC, ... That's 4! or 24 combinations. So for n cities you got (n-1)! combinations. For 10 cities that makes 362880 combinations. For 20 cities or 10^17 combinations you 'll run out of memory if you want to keep them all into memory.
An additional problem is that you 'll need n nested for loops, but it's impossible to just write those for loops, because there are n. (You can just start writing for() for() for() ....
So, your implementation will probably need some sort of walker approach, where you have a single loop that ticks through all combinations, much like a digital clock with each digit representing 1 index in the array.

You don't need (extra) memory for generating all permutations/solutions for a given instance. You can just write them on the screen...
Take a look at this implementation https://github.com/stardog-union/pellet/blob/master/core/src/main/java/org/mindswap/pellet/utils/PermutationGenerator.java.
It generates at each call of getNext() a new solution.
public void PermGen() {
int[] tour;
PermutationGenerator x = new PermutationGenerator(N);
System.out.println(x.getTotal());
while (x.hasMore()) {
tour = x.getNext();
System.out.println(Arrays.toString(tour));
}
}
The Java code above prints all TSP instance solutions...
But You can of course save them (in file(s) for example) but you'll need hundred of Terabytes to do it.

Closest Point on a Map

I am making a program where you can click on a map to see a "close-up view" of the area around it, such as on Google Maps.
When a user clicks on the map, it gets the X and Y coordinate of where they clicked.
Let's assume that I have an array of booleans of where these close-up view pictures are:
public static boolean[][] view_set=new boolean[Map.width][Map.height];
//The array of where pictures are. The map has a width of 3313, and a height of 3329.
The program searches through a folder, where images are named to where the X and Y coordinate of where it was taken on the map. The folder contains the following images (and more, but I'll only list five):
2377,1881.jpg, 2384,1980.jpg, 2389,1923.jpg, 2425,1860.jpg, 2475,1900.jpg
This means that:
view_set[2377][1881]=true;
view_set[2384][1980]=true;
view_set[2389][1923]=true;
view_set[2425][1860]=true;
view_set[2475][1900]=true;
If a user clicks at the X and Y of, for example, 2377,1882, then I need the program to figure out which image is closest (the answer in this case would be 2377,1881).
Any help would be appreciated,
Thanks.

Your boolean[][] is not a good datastructure for this problem, at least if it is not really dense (e.g. normally a point with close-up view is available in the surrounding 3×3 or maybe 5×5 square).
You want a 2-D-map with nearest-neighbor search. A useful data structure for this goal is the QuadTree. This is a tree of degree 4, used to represent spatial data. (I'm describing here the "Region QuadTree with point data".)
Basically, it divides a rectangle in four about equal size rectangles, and subdivides each of the rectangles further if there is more than one point in it.
So a node in your tree is one of these:
a empty leaf node (corresponding to a rectangle without points in it)
a leaf node containing exactly one point (corresponding to a rectangle with one point in it)
a inner node with four child nodes (corresponding to a rectangle with more than one point in it)
(In implementations, we can replace empty leaf nodes with a null-pointer in its parent.)
To find a point (or "the node a point would be in"), we start at the root node, look if our point is north/south/east/west of the dividing point, and go to the corresponding child node. We continue this until we arrive at some leaf node.
For adding a new point, we either wind up with an empty node - then we can put the new point here. If we end up at a node with already a point in it, create four child nodes (by splitting the rectangle) and add both points to the appropriate child node. (This might be the same, then repeat recursively.)
For the nearest-neighbor search, we will either wind up with an empty node - then we back up one level, and look at the other child nodes of this parent (comparing each distance). If we reach a child node with one point in it, we measure the distance of our search point to this point. If it is smaller than the distance to the edges or the node, we are done. Otherwise we will have to look at the points in the neighboring nodes, too, and compare the results here, taking the minimum. (We will have to look at at most four points, I think.)
For removal, after finding a point, we make its node empty. If the parent node now contains only one point, we replace it by a one-point leaf node.
The search and adding/removing are in O(depth) time complexity, where the maximum depth is limited by log((map length+width)/minimal distance of two points in your structure), and average depth is depending on the distribution of the points (e.g. the average distance to the next point), more or less.
Space needed is depending on number of points and average depth of the tree.
There are some variants of this data structure (for example splitting a node only when there are more than X points in it, or splitting not necessarily in the middle), to optimize the space usage and avoid too large depths of the tree.

Given the location the user clicked, you could search for the nearest image using a Dijkstra search.
Basically you start searching in increasingly larger rectangles around the clicked location for images. Of course you only have to search the boundaries of these rectangles, since you've already searched the body. This algorithm should stop as soon as an image is found.
Pseudo code:
int size = 0
Point result = default
while(result == default)
result = searchRectangleBoundary(size++, pointClicked)
function Point searchRectangleBoundary(int size, Point centre)
{
point p = {centre.X - size, centre.Y - size}
for i in 0 to and including size
{
if(view_set[p.X + i][p.Y]) return { p.X + i, p.Y}
if(view_set[p.X][p.Y + i]) return { p.X, p.Y + i}
if(view_set[p.X + i][p.Y + size]) return { p.X + i, p.Y + size}
if(view_set[p.X + size][p.Y + i]) return { p.X + size, p.Y + i}
}
return default
}
Do note that I've left out range checking for brevity.
There is a slight problem, but depending on the application, it might not be a problem. It doesn't use euclidian distances, but the manhattan metric. So it doesn't necessarily find the closest image, but an image at most the square root of 2 times as far.

Based on
your comment that states you have 350-500 points of interest,
your question that states you have a map width of 3313, and a height of 3329
my calculator which tells me that that represents ~11 million boolean values
...you're going about this the wrong way. #JBSnorro's answer is quite an elegant way of finding the needle (350 points) in the haystack (11 million points), but really, why create the haystack in the first place?
As per my comment on your question, why not just use a Pair<Integer,Integer> class to represent co-ordinates, store them in a set, and scan them? It's simpler, quicker, less memory consuming, and is way more scalable for larger maps (assuming the points of interest are sparse... which it seems is a sensible assumption given that they're points of interest).
..trust me, computing the Euclidean distance ~425 times beats wandering around an 11 million value boolean[][] looking for the 1 value in 25,950 that's of interest (esp. in a worst case analysis).
If you're really not thrilled with the idea of scanning ~425 values each time, then (i) you're more OCD than me (:P); (ii) you should check out nearest neighbour search algorithms.

I do not know if you are asking for this. If the user point is P1 {x1, y1} and you want to calculate its distance to P2 {x2,y2}, the distance is calculated using Pythagoras'Theorem
distance^2 = (x2-x1)^2 + (y2-y1)^2
If you only want to know the closest, you can avoid calculating the square root (the smaller the distance, the smaller the square too so it serves you the same).

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.