I've got a kind of algorithmic & performance problem to solve with Java. I've got a large collection of 2D points (let's say there are about 100 000 of them). I want to get a set of them that are in the given area around the search point SP(X_sp, Y_sp), so that I'd like to get the points P(x y) that meets the criteria:
x is between X_sp - constValue and X_sp + constValue AND y is between Y_sp - constValue and Y_sp + constValue
To give you an idea of the number relations, constValue will be like 2, 5 or 10, and x, y will range between 0 and 1000. It's meant to be a webservice, so a possibility of searching around many different points at the same time must be taken into account.
As these are fixed points (not to change due to calculations or something), I thought that it would be optimal to provide one list of objects sorted by X and another one, but sorted by Y. Then, I'll first get the points within the X range, and, using references, get the set of this points from another list (sorted by Y). Then I'll narrow this selection by Y and in result get the points in the given area.
I don't know Java inside-out, so I'd like to consult with you the most optimized approach. Which objects should I use to store sorted points, which allow for fast search of objects within range? Or maybe I have to implement my custom algorithm for this task? Also, when it comes to storing the points in the database, are SQL queries sufficiently fast to deliver the results? Or maybe NoSQL dbs are better for this?
I'm going to perform my own tests, but I'm looking for a starting candidates.
I'd probably use a TreeMap<Integer, TreeSet<Integer>>, where the key to the map is the x coordinate and for each x coordinate, you have a list of y coordinates. You can then use floorEntry and ceilingEntry to find the x coordinates that fall within your range. Then for each TreeSet<Integer> set that you get, you can use ceiling and floor to get the appropriate entries.
Of course, this only gives you the coordinates of the bounds of your box (the four corners). But TreeSet also has a subset that will give you a range of values. You will have to use this twice; once for the list of x coordinates (you can get the key set using the keySet method of the map) that are within your bounds, then for each x coordinate, the y coordinates that are within the bounds. So the pseudocode would be sort of like this:
List<Point> result = new ArrayList<>();
int lowerX = points.ceilingKey(x - c);
int upperX = points.floorKey(x + c);
for each x coordinate in points.entrySet().subset(lowerX, upperX)
TreeSet<Integer> yCoordinates = points.get(x);
lowerY = yCoordinates.ceiling(y - c);
upperY = yCoordinates.ceiling(y + c);
for each y coordinate in yCoordinates.subset(lowerY, upperY)
result.add(new Point(x, y))
I haven't tested this out, so there are probably some bugs or something I've missed. Let me know and I'll correct the answer.
The floor and ceiling calls are log(n) I believe -- this is where you get the performance benefit because if you use a list, it would be O(n) to look that up.
Note: I don't know if this is the most performant. SO is typically not the place for such an open-ended question so you might have more luck elsewhere.
Related
I have a rectangle Object with x, y, width and height. I have a list of these rectangles which are displayed on a screen. It is guaranteed that none of them overlap. Given a user's click position (x and y coordinates), I want to see which of these rectangles were clicked (since they do not overlap, there is a maximum of one rect that can be clicked).
I can obviously look through all of them and check for each one if the user clicked it but this is very slow because there are many on the screen. I can use some kind of comparison to keep the rectangles sorted when I insert a new one into the list. Is there some way to use something similar to binary search in order to decrease the time it takes to find which rect was clicked?
Note: the rectangles can be any size.
Thanks:)
Edit: To get an idea of what I am making visit koalastothemax.com
It highly depends upon your application and details we're not quite aware of yet for what the best solution would be. BUT, with as little as I know, I'd say you can make a 2D array that points to your rectangles. That 2D array would map directly to the pixels on the screen. So if you make the array 10x20, then the coordinate x divided by screen width times 10 (casted to int) will be the first index and y divided screen height times 20 would be your y index. With your x and y index, you can map directly to the rectangle that it points to. Some indexes might be empty and some might point to more than one rectangle if they're not perfectly laid out, but that seems the easiest way to me without knowing much about the application.
I have tackled a very similar problem in the past when developing a simulation. In my case the coordinates were doubles (so no integer indexing was possible) and there could be hundreds of millions of them that needed to be searched.
My solution was to create an Axis class to represent each axis as a sequence of ranges. The ranges were guaranteed to go from a minimum to a maximum and the class was smart enough to split itself into pieces when new ranges were added. Each range has a single generic object stored. The class used a binary search to find a range quickly.
So roughly the class looks like:
class Axis<T> {
public Axis(double min, double max, Supplier<T> creator);
public Stream<T> add(double from, double to);
public T get(double coord);
}
The add method needs to return a stream because the added range may cover several ranges.
To store rectanges:
Axis<Axis<Rectangle>> rectanges = new Axis<>(0.0, 100.0,
() -> new Axis<>(0.0, 100.0, Rectangle::new));
rectangles.add(x, x + w).forEach(r -> r.add(y, y + h).forEach(Rectangle::setPresent));
And to find a rectangle:
rectangles.get(x).get(y);
Note that there's always an object stored so you need a representation such as Rectangle.NULL for 'not present'. Or you could make it Optional<Rectangle> (though that indirection eats a lot of memory and processing for large numbers of rectangles).
I've just given the high level design here rather than any implementation details so let me know if you want more info on how to make it work. Getting the logic right on the range splits is not trivial. But I can guarantee that it's very fast even with very large numbers of rectangles.
The fastest way I can come up with is definitely not the most memory efficient. This works by exploiting the fact that an amortized hash table has constant lookup time. It will map every point that a rectangle has to that rectangle. This is only really effective if your are using integers. You might be able to get it to work with floats if you use a bit of rounding.
Make sure that the Point class has a hash code and equals function.
public class PointCheck
{
public Map<Point, Rect> pointMap;
public PointCheck()
{
pointMap = new HashMap<>();
}
/**
* Map all points that contain the rectangle
* to the rectangle.
*/
public void addRect(Rect rect)
{
for(int i = rect.x; i < rect.x + rect.width; ++i)
{
for(int j = rect.y; j < rect.y + rect.height; ++i)
{
pointMap.put(new Point(i, j), rect);
}
}
}
/**
* Returns the rectangle clicked, null
* if there is no rectangle.
*/
public Rect checkClick(Point click)
{
return pointMap.get(click);
}
}
Edit:
Just thought I should mention this: All of the rectangles held in the value of the hash map are references to the original rectangle, they are not clones.
I have a list of Rectangles, created in the usual way with:
List<Rectangle> rects = new ArrayList<>();
Some Rectangles are added (all with non-zero width and height). The number of Rectangles the List contains can be anywhere between 0 and 10,000, and will typically be between 4,000 and 6,000.
The list is sorted by ascending X-coordinate of the Rectangle origin, and then by ascending Y-coordinate for duplicate X-coordinates (though two or more rectangles with the same X-coordinate is rare).
I've verified the sorting is being done correctly (I'm using Collections.sort with a custom comparator).
I need a method that takes as input two ints, x and y, and returns the first Rectangle found containing the point (x,y), or null if no Rectangle in the list contains that point.
public Rectangle findContainingRectangle(int x, int y)
The naive method, which does give the desired functionality, is to just loop through the list and call the contains method on each Rectangle, but that is much too slow.
The List will be modified while the program is running, but at an insignificant rate compared to the rate at which the List needs to be searched, so an algorithm that requires a relatively slow initialization is fine.
I've looked at Collections.binarySearch but couldn't figure out how it might be used. I don't have much experience with Java so if there's another Collection that could be used similarly to a List but better suited to the type of search I need, then that's great (I have read the documentation on things like Maps and Sets but didn't recognize any advantage).
While maintaining a sorted list, you could use a binary search on the 'X' coordinate to find the candidates of the rectangles that contain the wanted 'X', and after which, use binary search on the 'Y' coordinate.
You should implement the binary search yourself, I can't see a way you can use the Collections.binarySearch method.
expected complexity: O(log n) as n the number of rectangles.
(It's a bit more because you might have duplicates)
However ,to do so, you should keep the array sorted while adding other instances, (sort after every insert).
Use HashSet. Map isn't appropriate here since you're not creating key-value pairs, and a Stream doesn't fit in this context either.
Be sure to override equals() and hashCode() in Rectangle, as described here: Why do I need to override the equals and hashCode methods in Java?
You can search your list using parallel stream like this
public Rectangle findContainingRectangle(final int x, final int y) {
List<Rectangle> rectangles = new ArrayList<>();
Rectangle rec = rectangles.parallelStream().filter((r)->{
if(r.getX()==x && r.getY()==y){
return true;
}
return false;
}).findFirst().get();
return rec;
}
Just run binary search a bunch of times - since the probability of same x is low as you say it wont take many times so it will still be logn
a) run binary search
b) remove item if found - and keep index where it was found
c) repeat binary search at a) with the remaining list until null is returned
d) then you have a small array of indexes and you can see which one is the smallest
e) then reinsert the removed elements at the designated spots
You can try and see a performance of a stream. I am not sure it will be fast enough but you can test it.
Rectangle rec = rects.stream().filter((r)->{
return r.contains(x, y);
}).findFirst().get();
You can create a Map.
Map is the best way to associate two values. You can associate the 'x' value and its first position in your List. Then you only have to loop from the first 'x' position to another 'x' in your list.
If you don't find the 'x' on the Map, they don't have the good rectangle on your list.
With this way you don't explore all bad 'x' entry.
In Java SE 7, I'm trying to solve a problem where I have a series of Rectangles. Through some user interaction, I get a Point. What I need to do is find the (first) Rectangle which contains the Point (if any).
Currently, I'm doing this via the very naieve solution of just storing the Rectangles in an ArrayList, and searching for the containing Rectangle by iterating over the list and using contains(). The problem is that, because this needs to be interactive for the user, this technique starts to be too slow for even a relatively small number of Rectangles (say, 200).
My current code looks something like this:
// Given rects is an ArrayList<Rectangle>, and p is a Point:
for(Rectangle r : rects)
{
if(r.contains(p))
{
return r;
}
}
return null;
Is there a more clever way to solve this problem (namely, in O(log n) instead of O(n), and/or with fewer calls to contains() by eliminating obviously bad candidates early)?
Yes, there is. Build 2 interval trees which will tell you if there is a rectangle between x1 to x2 and between y1 and y2. Then, when you have the co-ordinates of the point, perform O(log n) searches in both the trees.
That'll tell you if there are possibly rectangles around the point of interest. You still need to check if there is a common rectangle given by the two trees.
I've recently started learning Java and though doing a "Conway's Game of Life" style program would be a good thing to start out with. Everything works fine but I'm having some serious performance issues with this part:
static List<Point> coordList = new ArrayList<Point>();
public int neighbors(int x, int y){
int n = 0;
Point[] tempArray = { new Point(x-1, y-1), new Point(x, y-1), new Point(x+1, y-1),
new Point(x-1, y ), new Point(x+1, y ),
new Point(x-1, y+1), new Point(x, y+1), new Point(x+1, y+1)};
for (Point p : tempArray) {
if (coordList.contains(p))
n++;
}
return n;
}
The method is used when iterating the ArrayList coordList filled with Points and checking every element how many neighbors they have. When the list size gets to about 10000 Points every cycle takes about 1 seconds and for 20000 Points it takes 7 seconds.
My question is, what would be a more effective way to do this? I know there are several other programs of this kind with source code available too look at, but I wan't do do as much as I can by my self since the point of the project is me learning Java. Also, I don't want to use a regular array because of the limitations.
If your points are unique, you could store them in a HashSet instead of an ArrayList. The contains method will then become an O(1) operation vs. O(n) in your current setup. That should speed up that section significantly.
Apart from the declaration, your code should remain mostly unchanged as both implement the Collection interface, unless you call List-specific method such as get(i) for example.
Performance-wise, I think your best bet is to have a plain numeric (effectively Boolean) array representing the grid. Since this is a learning exercise, I'd start with a simple one-element-per-cell array, and then perhaps progress to packing eight adjacent cells into a single byte.
It is not entirely clear what you mean by "the limitations".
The following has some interesting pointers: Optimizing Conway's 'Game of Life'
Your current code scales in a quadratic manner O(n^2). You have only given part of the program. If you look at your whole program there will be a loop that calls neighbors() and you will see that neighbors() is called n times. Also the operation contains() is linear in n, so the time is proportional to their product n*n.
Quadratic scaling is a common problem but can often be reduced to linear by using indexed data structures such as HashSet.
I am making a program where you can click on a map to see a "close-up view" of the area around it, such as on Google Maps.
When a user clicks on the map, it gets the X and Y coordinate of where they clicked.
Let's assume that I have an array of booleans of where these close-up view pictures are:
public static boolean[][] view_set=new boolean[Map.width][Map.height];
//The array of where pictures are. The map has a width of 3313, and a height of 3329.
The program searches through a folder, where images are named to where the X and Y coordinate of where it was taken on the map. The folder contains the following images (and more, but I'll only list five):
2377,1881.jpg, 2384,1980.jpg, 2389,1923.jpg, 2425,1860.jpg, 2475,1900.jpg
This means that:
view_set[2377][1881]=true;
view_set[2384][1980]=true;
view_set[2389][1923]=true;
view_set[2425][1860]=true;
view_set[2475][1900]=true;
If a user clicks at the X and Y of, for example, 2377,1882, then I need the program to figure out which image is closest (the answer in this case would be 2377,1881).
Any help would be appreciated,
Thanks.
Your boolean[][] is not a good datastructure for this problem, at least if it is not really dense (e.g. normally a point with close-up view is available in the surrounding 3×3 or maybe 5×5 square).
You want a 2-D-map with nearest-neighbor search. A useful data structure for this goal is the QuadTree. This is a tree of degree 4, used to represent spatial data. (I'm describing here the "Region QuadTree with point data".)
Basically, it divides a rectangle in four about equal size rectangles, and subdivides each of the rectangles further if there is more than one point in it.
So a node in your tree is one of these:
a empty leaf node (corresponding to a rectangle without points in it)
a leaf node containing exactly one point (corresponding to a rectangle with one point in it)
a inner node with four child nodes (corresponding to a rectangle with more than one point in it)
(In implementations, we can replace empty leaf nodes with a null-pointer in its parent.)
To find a point (or "the node a point would be in"), we start at the root node, look if our point is north/south/east/west of the dividing point, and go to the corresponding child node. We continue this until we arrive at some leaf node.
For adding a new point, we either wind up with an empty node - then we can put the new point here. If we end up at a node with already a point in it, create four child nodes (by splitting the rectangle) and add both points to the appropriate child node. (This might be the same, then repeat recursively.)
For the nearest-neighbor search, we will either wind up with an empty node - then we back up one level, and look at the other child nodes of this parent (comparing each distance). If we reach a child node with one point in it, we measure the distance of our search point to this point. If it is smaller than the distance to the edges or the node, we are done. Otherwise we will have to look at the points in the neighboring nodes, too, and compare the results here, taking the minimum. (We will have to look at at most four points, I think.)
For removal, after finding a point, we make its node empty. If the parent node now contains only one point, we replace it by a one-point leaf node.
The search and adding/removing are in O(depth) time complexity, where the maximum depth is limited by log((map length+width)/minimal distance of two points in your structure), and average depth is depending on the distribution of the points (e.g. the average distance to the next point), more or less.
Space needed is depending on number of points and average depth of the tree.
There are some variants of this data structure (for example splitting a node only when there are more than X points in it, or splitting not necessarily in the middle), to optimize the space usage and avoid too large depths of the tree.
Given the location the user clicked, you could search for the nearest image using a Dijkstra search.
Basically you start searching in increasingly larger rectangles around the clicked location for images. Of course you only have to search the boundaries of these rectangles, since you've already searched the body. This algorithm should stop as soon as an image is found.
Pseudo code:
int size = 0
Point result = default
while(result == default)
result = searchRectangleBoundary(size++, pointClicked)
function Point searchRectangleBoundary(int size, Point centre)
{
point p = {centre.X - size, centre.Y - size}
for i in 0 to and including size
{
if(view_set[p.X + i][p.Y]) return { p.X + i, p.Y}
if(view_set[p.X][p.Y + i]) return { p.X, p.Y + i}
if(view_set[p.X + i][p.Y + size]) return { p.X + i, p.Y + size}
if(view_set[p.X + size][p.Y + i]) return { p.X + size, p.Y + i}
}
return default
}
Do note that I've left out range checking for brevity.
There is a slight problem, but depending on the application, it might not be a problem. It doesn't use euclidian distances, but the manhattan metric. So it doesn't necessarily find the closest image, but an image at most the square root of 2 times as far.
Based on
your comment that states you have 350-500 points of interest,
your question that states you have a map width of 3313, and a height of 3329
my calculator which tells me that that represents ~11 million boolean values
...you're going about this the wrong way. #JBSnorro's answer is quite an elegant way of finding the needle (350 points) in the haystack (11 million points), but really, why create the haystack in the first place?
As per my comment on your question, why not just use a Pair<Integer,Integer> class to represent co-ordinates, store them in a set, and scan them? It's simpler, quicker, less memory consuming, and is way more scalable for larger maps (assuming the points of interest are sparse... which it seems is a sensible assumption given that they're points of interest).
..trust me, computing the Euclidean distance ~425 times beats wandering around an 11 million value boolean[][] looking for the 1 value in 25,950 that's of interest (esp. in a worst case analysis).
If you're really not thrilled with the idea of scanning ~425 values each time, then (i) you're more OCD than me (:P); (ii) you should check out nearest neighbour search algorithms.
I do not know if you are asking for this. If the user point is P1 {x1, y1} and you want to calculate its distance to P2 {x2,y2}, the distance is calculated using Pythagoras'Theorem
distance^2 = (x2-x1)^2 + (y2-y1)^2
If you only want to know the closest, you can avoid calculating the square root (the smaller the distance, the smaller the square too so it serves you the same).