Should Quadtrees store points only in the children? - java

Though we usually define a capacity for each Quadtree, It seems like all the pseudo-code algorithms I find online don't care much for the visual number of points inside a Quadtree.
Is it necessary to redistribute the points contained by a Quadtree into its children when it's being divided?
I can't seem to implement that correctly, and the pseudo-code category of Wikipedia's Quadtree only has a comment about this (but no code):
We have to add the points/data contained into this quad array to the new quads if we want that only the last node holds the data

Assuming that the question is about a Point Quadtree, let's have a review.
Is it necessary to redistribute the points contained by a Quadtree into its children when it's being divided?
No, it's not necessary.
But, is it better?
Redistribution is mostly just something you turn on when performance measurements in your specific use case show your code runs faster with it enabled.
Thanks to #Mike'Pomax'Kamermans for the comment.
It depends on how you're using the quadtree.
Keep in mind that,
You would still define a capacity for each quadtree which acts as the dividing threshold.
It won't affect the overall time complexity.
More quadtrees will be present.
Parents won't hold any points.
Pay the price at insertion.
You have to redistribute the points when the capacity is met.
What's the impact on queries?
Out of range points are ceased faster.
Brute-forcefully checked points may decrease.
Checked quadtrees may increase.
Will it execute faster or slower compared to the other mode, really depends on the environment; Capacity, recursive stack overhead and points dispersion.
But, as you have paid a price at insertion time, it usually performs the queries faster.

Related

Implementing a fixed-size hash map

I need to implement a fixed-size hash map optimized for memory and speed, but am unclear as to what this means: Does it mean that the number of buckets I can have in my hash map is fixed? Or that I cannot dynamically allocate memory and expand the size of my hash map to resolve collisions via linked lists? If the answer to the latter question is yes, the first collision resolution approach that comes to mind is linear probing--can anyone comment on other more memory and speed efficient methods, or point me towards any resources to get started? Thanks!
Without seeing specific requirements it's hard to interpret the meaning of "fixed-size hash map optimized for memory and speed," so I'll focus on the "optimized for memory and speed" aspect.
Memory
Memory efficiency is hard to give advice for particularly if the hash map actually exists as a "fixed" size. In general open-addressing can be more memory efficient because the key and value alone can be stored without needing pointers to next and/or previous linked list nodes. If your hash map is allowed to resize, you'll want to pick a collision resolution strategy that allows for a larger load (elements/capacity) before resizing. 1/2 is a common load used by many hash map implementations, but it means at least 2x the necessary memory is always used. The collision resolution strategy generally needs to be balanced between speed and memory efficiency, in particular tuned for your actual requirements/use case.
Speed
From a real-world perspective, particularly for smaller hash-maps or those with trivially sized key, the most important aspect to optimizing speed would likely be reducing cache-misses. That means put as much of the information needed to perform operations in contiguous memory spaces as possible.
My advice would be to use open-addressing instead of chaining for collision resolution. This would allow for more of your memory to be contiguous and should be at a minimum 1 less cache miss per key comparison. Open-addressing will require some-kind of probing, but compared to the cost of fetching each link of a linked list from memory looping over several array elements checking key comparison should be faster. See here for a benchmark of c++ std::vector vs std::list, the takeaway being for most operations a normal contiguous array is faster due to spatial locality despite algorithm complexity.
In terms of types of probing, linear probing has an issue of clustering. As collisions occur the adjacent elements are consumed which causes more and more collisions in the same section of the array, this becomes exceptionally important when the table is nearly full. This could be solved with re-hashing, Robin-Hood hashing (As you are probing to insert, if you reach an element closer to it's ideal slot than the element being inserted, swap the two and try to insert that element instead, A much better description can be seen here), etc. Quadratic Probing doesn't have the same problem of clustering that linear probing has, but it has it's own limitation, not every array location can be reached from every other array location, so dependent on size, typically only half the array can be filled before it would need to be resized.
The size of the array is also something that effects performance. The most common sizings are power of 2 sizes, and prime number sizes. Java: A "prime" number or a "power of two" as HashMap size?
Arguments exist for both, but mostly performance will depend on usage, specifically power of two sizes are usually very bad with hashes that are sequential, but the conversion of hash to array index can be done with a single and operation vs a comparatively expensive mod.
Notably, the google_dense_hash is a very good hash map, easily outperforming the c++ standard library variation in almost every use case, and uses open-addressing, a power of 2 resizing convention, and quadratic probing. Malte Skarupke wrote an excellent hash table beating the google_dense_hash in many cases, including lookup. His implementation uses robin hood hashing and linear probing, with a probe length limit. It's very well described in a blog post along with excellent benchmarking against other hash tables and a description of the performance gains.

Name this collection

This question is language-agnostic (although it assumes one that is both procedural and OO).
I'm having trouble finding if there is a standard name for a collection with the following behavior:
-Fixed-capacity of N elements, maintaining insertion order.
-Elements are added to the 'Tail'
-Whenever an item is added, the head of the collection is returned (FIFO), although not necessarily removed.
-If the collection now contains more than N elements, the Head is removed - otherwise it remains in the collection (now having advanced one step further towards its ultimate removal).
I often use this structure to keep a running count - i.e. the frame length of the past N frames, so as to provide 'moving window' across which I can average, sum, etc.
Sounds very similar to a circular buffer to me; with the exception that you are probably under-defining or over constraining the add / remove behavior.
Note that there are two "views" of a circular buffer. One is the layout view, which has a section of memory being written to with "head" and "tail" indexes and a bit of logic to "wrap" around when the tail is "before" the head. The other is a "logical" view where you have a queue that's not exposing how it is being laid out, but definately has a limited number of slots which it can "grow to".
Within the context of doing computation, there is a very long standing project that I love (although the cli interface is a bit foreign if you're not use to such things). It's called the RoundRobinDatabase, where each databases stores exactly N copies of a single value (providing graphs, averages, etc). It adjusts the next bin based on a number of parameters, but most often it advances bins based on time. It's often the tool behind a large number of network throughput graphs, and it has configurable bin collision resolution, etc.
In general, algorithims that are sensitive to the last "some number" of entries are often called "sliding box" algorithms, but that's focusing on the algorithm and not on the data structure :)
The programming riddle sounds like a circular linked list to me.
Well, all these description fits, doesn't it?
• Fixed-capacity of N elements, maintaining insertion order.
• Elements are added to the 'Tail'
• Whenever an item is added, the head of the collection is returned (FIFO), although not necessarily removed.
This link with source codes for counting frames probably helps too: frameCounter

The most efficient implementation of adjacency list?

I want to create an adjacency list in Java and since I will get a huge set of nodes later as input, it needs to be really efficient.
What sort of implementation is best for this scenario?
A list of lists or maybe a map? I also need to save the edge weights somewhere. I could not figure out how to do this, since the adjacency list itself apparently just keeps track of the connected nodes, but not the edge weight.
Warning: this route is the most masochistic and hardest to maintain possible, and only recommended when the highest possible performance is required.
Adjacency lists are one of the most awkward classes of data structures to optimize, mainly because they vary in size from one vertex to the next. At some broad conceptual level, if you include the adjacency data as part of the definition of a Vertex or Node, then that makes the size of a Vertex/Node variable. Variable-sized data and the kind of memory contiguity needed to be cache-friendly tend to fight one another in most programming languages.
Most object-oriented languages weren't designed to deal with objects that can actually vary in size. They solve that by making them point to/reference memory elsewhere, but that leads to much higher cache misses.
If you want cutting-edge efficiency and you traverse adjacent vertices/nodes a lot, then you want a vertex and its variable number of references/indices to adjacent neighbors (and their weights in your case) to fit in a single cache line, and possibly with a good likelihood that some of those neighboring vertices also fit in the same cache line (though solving this and reorganizing the data to map a 2D graph to a 1-dimensional memory space is an NP-hard problem, but existing heuristics help a lot).
So it ceases to become a question of what data structures to use so much as what memory layouts to use. Arrays are your friend here, but not arrays of nodes. You want an array of bytes packing node data contiguously. Something like this:
[node1_data num_adj adj1 adj2 adj3 (possibly some padding for alignment and to avoid straddling) node2_data num_adj adj1 adj2 adj3 ...]
Node insertion and removal here starts to resemble the kind of algorithms you find to implement memory allocators. When you connect a new edge, that actually changes the node's size and potentially its position in these giant, contiguous memory blocks. Unlike memory allocators, you're potentially allowed to reshuffle and compact and defrag the data provided that you can update your references/indices to it.
Now this is only if you want the fastest possible solution, and provided your use cases are heavily weighted towards read operations (evaluation, traversal) rather than writes (connecting edges, inserting nodes, removing nodes). It's completely overkill otherwise, and a complete PITA since you'll lose all that nice object-oriented structure that helps keep the code easy to maintain, reuse, etc. This has you obliterating all that structure in favor of dealing with things at the bits and bytes level, and it's only worth doing if your software is in a realm where its quality is somehow very proportional to the efficiency of that graph.
One solution you can think of create a class Node which contains the data and a wt. this weight will be the weight of edge through which it is connected to the Node.
suppose you have a list for Node I which is connected to node A B C with edge weight a b c. And Node J is connected to A B C with x y z weights, so the adj List of I will contains the Node object as
I -> <A, a>,<B b>,<C c>
List of J will contains the Node object as
J -> <A, x>,<B y>,<C z>

Lookup algorithm that returns regions?

I have a large list of regions with 2D coordinates. None of the regions overlap. The regions are not immediately adjacent to one another and do not follow a placement pattern.
Is there an efficient lookup algorithm that can be used to let me know what region a specific point will fall into? This seems like it would be the exact inverse of what a QuadTree is.
The data structure you need is called an R-Tree. Most RTrees permit a "Within" or "Intersection" query, which will return any geographic area containing or overlapping a given region, see, e.g. wikipedia.
There is no reason that you cannot build your own R-Tree, its just a variant on a balanced B-Tree which can hold extended structures and allows some overlap. This implementation is lightweight, and you could use it here by wrapping your regions in rectangles. Each query might return more than one result but then you could check the underlying region. Its probably an easier solution than trying to build a polyline-supporting R-tree version.
What you need, if I understand correctly, is a point location data structure that is, as you put it, somehow the opposite of quad or R-tree. In a point location data structure you have a set of regions stored, and the queries are of the form: given point p give me the region in which it is contained.
Several point location data structures exists, the most famous and the one that achieves the best performance is the Kirkpatrick's one also known as triangulation refinement and achieves O(n) space and O(logn) query time; but is also famous to be hard to implement. On the other hand there are several simpler data structures that achieves O(n) or O(nlogn) space but O(log^2n) query time, which is not that bad and way easier to implement, and for some is possible to reduce the query time to O(logn) using a method called fractional cascading.
I recommend you to take a look into chapter 6 of de Berg, Overmars, et al. Computational Geometry: Algorithms and Applications which explains the subject in a way very easy to grasp, though it doesn't includes Kirkpatrick's method, which you can find it in Preparata's book or read it directly from Kirkpatrick's paper.
BTW, several of this structures assumes that your regions are not overlapping but are expected to be adjacent (regions share edges), and the edges forms a connected graph, some times triangular regions are also assumed. In all cases you can extend your set of regions by adding new edges, but don't you worry for that, since the extra space needed will be still linear, since the final set of regions will induce a planar graph. So you can blindly extend your sets of regions without worrying with too much growth of space.

Techniques for keeping data in the cache, locality?

For ultra-fast code it essential that we keep locality of reference- keep as much of the data which is closely used together, in CPU cache:
http://en.wikipedia.org/wiki/Locality_of_reference
What techniques are to achieve this? Could people give examples?
I interested in Java and C/C++ examples. Interesting to know of ways people use to stop lots of cache swapping.
Greetings
This is probably too generic to have clear answer. The approaches in C or C++ compared to Java will differ quite a bit (the way the language lays out objects differ).
The basic would be, keep data that will be access in close loops together. If your loop operates on type T, and it has members m1...mN, but only m1...m4 are used in the critical path, consider breaking T into T1 that contains m1...m4 and T2 that contains m4...mN. You might want to add to T1 a pointer that refers to T2. Try to avoid objects that are unaligned with respect to cache boundaries (very platform dependent).
Use contiguous containers (plain old array in C, vector in C++) and try to manage the iterations to go up or down, but not randomly jumping all over the container. Linked Lists are killers for locality, two consecutive nodes in a list might be at completely different random locations.
Object containers (and generics) in Java are also a killer, while in a Vector the references are contiguous, the actual objects are not (there is an extra level of indirection). In Java there are a lot of extra variables (if you new two objects one right after the other, the objects will probably end up being in almost contiguous memory locations, even though there will be some extra information (usually two or three pointers) of Object management data in between. GC will move objects around, but hopefully won't make things much worse than it was before it run.
If you are focusing in Java, create compact data structures, if you have an object that has a position, and that is to be accessed in a tight loop, consider holding an x and y primitive types inside your object rather than creating a Point and holding a reference to it. Reference types need to be newed, and that means a different allocation, an extra indirection and less locality.
Two common techniques include:
Minimalism (of data size and/or code size/paths)
Use cache oblivious techniques
Example for minimalism: In ray tracing (a 3d graphics rendering paradigm), it is a common approach to use 8 byte Kd-trees to store static scene data. The traversal algorithm fits in just a few lines of code. Then, the Kd-tree is often compiled in a manner that minimalizes the number of traversal steps by having large, empty nodes at the top of tree ("Surface Area Heuristics" by Havran).
Mispredictions typically have a probability of 50%, but are of minor costs, because really many nodes fit in a cache-line (consider that you get 128 nodes per KiB!), and one of the two child nodes is always a direct neighbour in memory.
Example for cache oblivious techniques: Morton array indexing, also known as Z-order-curve-indexing. This kind of indexing might be preferred if you usually access nearby array elements in unpredictable direction. This might be valuable for large image or voxel data where you might have 32 or even 64 bytes big pixels, and then millions of them (typical compact camera measure is Megapixels, right?) or even thousands of billions for scientific simulations.
However, both techniques have one thing in common: Keep most frequently accessed stuff nearby, the less frequently things can be further away, spanning the whole range of L1 cache over main memory to harddisk, then other computers in the same room, next room, same country, worldwide, other planets.
Some random tricks that come to my mind, and which some of them I used recently:
Rethink your algorithm. For example, you have an image with a shape and the processing algorithm that looks for corners of the shape. Instead of operating on the image data directly, you can preprocess it, save all the shape's pixel coordinates in a list and then operate on the list. You avoid random the jumping around the image
Shrink data types. Regular int will take 4 bytes, and if you manage to use e.g. uint16_t you will cache 2x more stuff
Sometimes you can use bitmaps, I used it for processing a binary image. I stored pixel per bit, so I could fit 8*32 pixels in a single cache line. It really boosted the performance
Form Java, you can use JNI (it's not difficult) and implement your critical code in C to control the memory
In the Java world the JIT is going to be working hard to achieve this, and trying to second guess this is likely to be counterproductive. This SO question addresses Java-specific issues more fully.

Categories

Resources