I have a large list of regions with 2D coordinates. None of the regions overlap. The regions are not immediately adjacent to one another and do not follow a placement pattern.
Is there an efficient lookup algorithm that can be used to let me know what region a specific point will fall into? This seems like it would be the exact inverse of what a QuadTree is.
The data structure you need is called an R-Tree. Most RTrees permit a "Within" or "Intersection" query, which will return any geographic area containing or overlapping a given region, see, e.g. wikipedia.
There is no reason that you cannot build your own R-Tree, its just a variant on a balanced B-Tree which can hold extended structures and allows some overlap. This implementation is lightweight, and you could use it here by wrapping your regions in rectangles. Each query might return more than one result but then you could check the underlying region. Its probably an easier solution than trying to build a polyline-supporting R-tree version.
What you need, if I understand correctly, is a point location data structure that is, as you put it, somehow the opposite of quad or R-tree. In a point location data structure you have a set of regions stored, and the queries are of the form: given point p give me the region in which it is contained.
Several point location data structures exists, the most famous and the one that achieves the best performance is the Kirkpatrick's one also known as triangulation refinement and achieves O(n) space and O(logn) query time; but is also famous to be hard to implement. On the other hand there are several simpler data structures that achieves O(n) or O(nlogn) space but O(log^2n) query time, which is not that bad and way easier to implement, and for some is possible to reduce the query time to O(logn) using a method called fractional cascading.
I recommend you to take a look into chapter 6 of de Berg, Overmars, et al. Computational Geometry: Algorithms and Applications which explains the subject in a way very easy to grasp, though it doesn't includes Kirkpatrick's method, which you can find it in Preparata's book or read it directly from Kirkpatrick's paper.
BTW, several of this structures assumes that your regions are not overlapping but are expected to be adjacent (regions share edges), and the edges forms a connected graph, some times triangular regions are also assumed. In all cases you can extend your set of regions by adding new edges, but don't you worry for that, since the extra space needed will be still linear, since the final set of regions will induce a planar graph. So you can blindly extend your sets of regions without worrying with too much growth of space.
Related
I'm creating a graph in java using jGraphT and adding vertexes and edges from a list using a stream.
My question is:
Can I use stream().parallel() to add them faster?
No, at least not as far as I'm aware. Essentially, adding a vertex or edge boils down to 2 steps: (a) check whether the edge/vertex already exists and if not (b) add the edge/vertex. Depending on the type of graph, step (b) involves adding the object to the appropriate container that stores the edges/vertices. I'm not an expert on concurrent programming, but I don't see how a parallel stream can do the above faster.
I don't know exactly what your usecase is, or what you try to accomplish. There are however some optimized, special graph types in the jgrapht-opt package that might benefit you. The graph functionality doesn't change (i.e. you can run the same algorithms on them); only the way the graph is stored changes. Some storage mechanisms are more memory efficient, allowing you to store massive graphs using little memory. Other graphs, such as the sparse graphs, can be created quicker and access operations are also quicker, but these graphs are typically immutable, i.e. once created they cannot be changed. What you need really depends on your usecase.
Though we usually define a capacity for each Quadtree, It seems like all the pseudo-code algorithms I find online don't care much for the visual number of points inside a Quadtree.
Is it necessary to redistribute the points contained by a Quadtree into its children when it's being divided?
I can't seem to implement that correctly, and the pseudo-code category of Wikipedia's Quadtree only has a comment about this (but no code):
We have to add the points/data contained into this quad array to the new quads if we want that only the last node holds the data
Assuming that the question is about a Point Quadtree, let's have a review.
Is it necessary to redistribute the points contained by a Quadtree into its children when it's being divided?
No, it's not necessary.
But, is it better?
Redistribution is mostly just something you turn on when performance measurements in your specific use case show your code runs faster with it enabled.
Thanks to #Mike'Pomax'Kamermans for the comment.
It depends on how you're using the quadtree.
Keep in mind that,
You would still define a capacity for each quadtree which acts as the dividing threshold.
It won't affect the overall time complexity.
More quadtrees will be present.
Parents won't hold any points.
Pay the price at insertion.
You have to redistribute the points when the capacity is met.
What's the impact on queries?
Out of range points are ceased faster.
Brute-forcefully checked points may decrease.
Checked quadtrees may increase.
Will it execute faster or slower compared to the other mode, really depends on the environment; Capacity, recursive stack overhead and points dispersion.
But, as you have paid a price at insertion time, it usually performs the queries faster.
is user inside volume OpenGL ES Java Android
I have an opengl renderer that shows airspaces.
I need to calculate if my location already converted in float[3] is inside many volumes.
I also want to calculate the distance with the nearest volume.
Volumes are random shapes extruded along z axis.
What is the most efficient algorithm to do that?
I don t want to use external library.
What you have here is a Nearest Neighbor Search problem. Since your meshes are constant and won't change, you should probably use a space partioning algorithm. It's a big topic but, in short, you generally need to use a tree structure and sort all the objects to be put into various tree nodes. You'll need to pre-calculate the tree itself. There are plenty of books and tutorials on the net about space partioning, and you could also at source code of, for example, id Software products like Doom, Quake etc. to see how this algorithms (BSP, at least) are used. The efficiency of each algorithm depends on what you have and what you need. Using BSP trees, for example, you'll have the objects sorted from nearest to farthest so you can quickly get the one you need.
I'm after an efficient 2D mapping algorithm, and I've tried a number of implementations, but they all seem lacking. I'm hoping the stackoverflow world can help out with some pointers to existing, tried-n-tested algorithms I could learn from.
My goal is to display articles based on the genre of writing; for the prototype, I am using Philosophy, Programming, Politics and Poetry, since those are the only four styles of writing I have.
Each article is weighted based on each category, and the home view will have each category as a header in each corner. The articles are then laid out in word-cloud-like format, with "artificial gravity" placing each item as-near-as-possible to its main category (or between its main categories), without overlapping.
Currently, I am using an inefficient algorithm which stores arrays of rectangles to perform hit-test-and-search every time an article is added to the view, (with A* search patterns to find empty space to fill). By approximating a single destination for all articles of the same weight, and by using a round-robin queue to pick off articles from each pool, I can achieve fresh results (arrays are sorted by weight, then timestamp), with positioning-by-relevance ("artificial gravity").
However, using A* to blindly search seems really wasteful, even with heuristics to make each article check closest to it's target marks first. I need a more efficient way to iterate over a 2D space.
I'm wondering if a Linked-List approach might work better; rather than go searching blindly in all directions for empty space, I can just iterate through connected nodes to ask each one if it has either a) nearby free space, or b) other connected nodes to ask (and always ask the closest node first).
If there are any better algorithms available, or critiques on my methods, any and all help would surely be appreciated.
I am using gwt elemental + java in this gui, but any 2D mapping algorithm in any language will surely help.
[EDIT (request for more details)] : The main problem here is the amount of work each new addition performs; it produces noticable glitches in the ui thread, especially when there is almost no space left, as I am searching many points in a given radius for enough free space to fit the article.
If I cut the algorithm off too soon, I get blank spots that could have been filled. If I let it run too long, the ui glitches pretty bad, and I'm sure users will hate it.
What is the fastest / most efficient way to store and modify collections of 2D space?
You haven't provided enough information to say what would make an algorithm "better." Faster? Produces layouts that are "nicer" by some metric for quality? Able to handler bigger data sources?
There is certainly nothing wrong with arrays, nor with A*. If they are giving acceptable results with the size of problem you are trying to solve, how can they be "wasteful?" Linked data structures are worthwhile only if they reduce cost of frequently needed operations.
If you sharpen the problem, you're more likely to get a useful answer.
At any rate, there is an enormous literature on "graph layout" and "graph drawing." Try searching on these terms. If you can represent your desired layout as a collection of nodes and edges, these might apply. Many are based on simulated spring systems, which seems akin to what you are doing.
For ultra-fast code it essential that we keep locality of reference- keep as much of the data which is closely used together, in CPU cache:
http://en.wikipedia.org/wiki/Locality_of_reference
What techniques are to achieve this? Could people give examples?
I interested in Java and C/C++ examples. Interesting to know of ways people use to stop lots of cache swapping.
Greetings
This is probably too generic to have clear answer. The approaches in C or C++ compared to Java will differ quite a bit (the way the language lays out objects differ).
The basic would be, keep data that will be access in close loops together. If your loop operates on type T, and it has members m1...mN, but only m1...m4 are used in the critical path, consider breaking T into T1 that contains m1...m4 and T2 that contains m4...mN. You might want to add to T1 a pointer that refers to T2. Try to avoid objects that are unaligned with respect to cache boundaries (very platform dependent).
Use contiguous containers (plain old array in C, vector in C++) and try to manage the iterations to go up or down, but not randomly jumping all over the container. Linked Lists are killers for locality, two consecutive nodes in a list might be at completely different random locations.
Object containers (and generics) in Java are also a killer, while in a Vector the references are contiguous, the actual objects are not (there is an extra level of indirection). In Java there are a lot of extra variables (if you new two objects one right after the other, the objects will probably end up being in almost contiguous memory locations, even though there will be some extra information (usually two or three pointers) of Object management data in between. GC will move objects around, but hopefully won't make things much worse than it was before it run.
If you are focusing in Java, create compact data structures, if you have an object that has a position, and that is to be accessed in a tight loop, consider holding an x and y primitive types inside your object rather than creating a Point and holding a reference to it. Reference types need to be newed, and that means a different allocation, an extra indirection and less locality.
Two common techniques include:
Minimalism (of data size and/or code size/paths)
Use cache oblivious techniques
Example for minimalism: In ray tracing (a 3d graphics rendering paradigm), it is a common approach to use 8 byte Kd-trees to store static scene data. The traversal algorithm fits in just a few lines of code. Then, the Kd-tree is often compiled in a manner that minimalizes the number of traversal steps by having large, empty nodes at the top of tree ("Surface Area Heuristics" by Havran).
Mispredictions typically have a probability of 50%, but are of minor costs, because really many nodes fit in a cache-line (consider that you get 128 nodes per KiB!), and one of the two child nodes is always a direct neighbour in memory.
Example for cache oblivious techniques: Morton array indexing, also known as Z-order-curve-indexing. This kind of indexing might be preferred if you usually access nearby array elements in unpredictable direction. This might be valuable for large image or voxel data where you might have 32 or even 64 bytes big pixels, and then millions of them (typical compact camera measure is Megapixels, right?) or even thousands of billions for scientific simulations.
However, both techniques have one thing in common: Keep most frequently accessed stuff nearby, the less frequently things can be further away, spanning the whole range of L1 cache over main memory to harddisk, then other computers in the same room, next room, same country, worldwide, other planets.
Some random tricks that come to my mind, and which some of them I used recently:
Rethink your algorithm. For example, you have an image with a shape and the processing algorithm that looks for corners of the shape. Instead of operating on the image data directly, you can preprocess it, save all the shape's pixel coordinates in a list and then operate on the list. You avoid random the jumping around the image
Shrink data types. Regular int will take 4 bytes, and if you manage to use e.g. uint16_t you will cache 2x more stuff
Sometimes you can use bitmaps, I used it for processing a binary image. I stored pixel per bit, so I could fit 8*32 pixels in a single cache line. It really boosted the performance
Form Java, you can use JNI (it's not difficult) and implement your critical code in C to control the memory
In the Java world the JIT is going to be working hard to achieve this, and trying to second guess this is likely to be counterproductive. This SO question addresses Java-specific issues more fully.