Programming a 2D grid in Java - java

What is the best data structure to use when programming a 2-dimensional grid of tiles in Java? Tiles on the grid should be easily referenced by their location, so that neighbors and paths can be efficiently computed. Should it be a 2D array? An ArrayList? Something else?

If you're not worrying about speed or memory too much, you can simply use a 2D array - this should work well enough.
If speed and/or memory are issues for you then this depends on memory usage and the access pattern.
A single dimensional array is the way to go if you need high performance. You compute the proper index as y * wdt + x. There are 2 potential problems with this: cache misses and memory usage.
If you know that your access pattern is such that you fetch neighbours of an element most of the time, then mapping a 2D space into a 1D array as described above may cause cache misses - you want the neighbours to be close in memory, and neighbours from 2 different rows are not. You may have to map your 2d tiles in a different order to your 1d array. See Hilbert curves for example.
For better memory usage, if you know that most of your tiles are always the same (e.g. always grass), you might want to implement a sparse array or a quad tree. Both can be implemented quite efficiently, with cache awareness in mind (the sparse array link is good example for this). Another benefit is that these can be dynamically extended. However, you will always have to pay extra levels of indirection in the end for this to work.
NOTE: Be careful with using generic classes such as HashMaps with the key type being some primitive type or a special location class if you're worried about performance - you will either have to allocate an object each time you index the hash map or pay the price of boxing/unboxing. In addition to this, hash maps will not allow you efficient spatial queries (e.g. give me all objects existing in the radius R of a given object - quad trees are better for this).

If you have a fixed dimension for your grid, use a 2D array. If you need the size to be dynamic, use an ArrayList of ArrayLists.

A 2D array seems like a good bet if you plan on inserting stuff into specific locations. As long as its a fixed Size.

The data structure to use really depends on the type of operations you will perform:
In case the number of meaningful positions (nonzero/nondefault) in the grid is rather low (<< n x m) it might be more space efficient to use a hashmap, that maps (x,y) positions to specific tiles. Also you can iterate over meaningful positions alot more efficiently. In addition you could store references to neighboring tiles to each tile to speed up path/neighborhood traversal.
If your grid is densely filled with "information" you should consider using a 2d array or ArrayList (in case you will at some point have generic types involved as "tile-type", you have to use ArrayLists, since Java does not allow native arrays of generic type).

If you simply need to iterate over the grid and random addressing of cells, then MyCellType[][] should be fine. This is most efficient in terms of space and (one would expect) time for these use-cases.

Related

The most efficient implementation of adjacency list?

I want to create an adjacency list in Java and since I will get a huge set of nodes later as input, it needs to be really efficient.
What sort of implementation is best for this scenario?
A list of lists or maybe a map? I also need to save the edge weights somewhere. I could not figure out how to do this, since the adjacency list itself apparently just keeps track of the connected nodes, but not the edge weight.
Warning: this route is the most masochistic and hardest to maintain possible, and only recommended when the highest possible performance is required.
Adjacency lists are one of the most awkward classes of data structures to optimize, mainly because they vary in size from one vertex to the next. At some broad conceptual level, if you include the adjacency data as part of the definition of a Vertex or Node, then that makes the size of a Vertex/Node variable. Variable-sized data and the kind of memory contiguity needed to be cache-friendly tend to fight one another in most programming languages.
Most object-oriented languages weren't designed to deal with objects that can actually vary in size. They solve that by making them point to/reference memory elsewhere, but that leads to much higher cache misses.
If you want cutting-edge efficiency and you traverse adjacent vertices/nodes a lot, then you want a vertex and its variable number of references/indices to adjacent neighbors (and their weights in your case) to fit in a single cache line, and possibly with a good likelihood that some of those neighboring vertices also fit in the same cache line (though solving this and reorganizing the data to map a 2D graph to a 1-dimensional memory space is an NP-hard problem, but existing heuristics help a lot).
So it ceases to become a question of what data structures to use so much as what memory layouts to use. Arrays are your friend here, but not arrays of nodes. You want an array of bytes packing node data contiguously. Something like this:
[node1_data num_adj adj1 adj2 adj3 (possibly some padding for alignment and to avoid straddling) node2_data num_adj adj1 adj2 adj3 ...]
Node insertion and removal here starts to resemble the kind of algorithms you find to implement memory allocators. When you connect a new edge, that actually changes the node's size and potentially its position in these giant, contiguous memory blocks. Unlike memory allocators, you're potentially allowed to reshuffle and compact and defrag the data provided that you can update your references/indices to it.
Now this is only if you want the fastest possible solution, and provided your use cases are heavily weighted towards read operations (evaluation, traversal) rather than writes (connecting edges, inserting nodes, removing nodes). It's completely overkill otherwise, and a complete PITA since you'll lose all that nice object-oriented structure that helps keep the code easy to maintain, reuse, etc. This has you obliterating all that structure in favor of dealing with things at the bits and bytes level, and it's only worth doing if your software is in a realm where its quality is somehow very proportional to the efficiency of that graph.
One solution you can think of create a class Node which contains the data and a wt. this weight will be the weight of edge through which it is connected to the Node.
suppose you have a list for Node I which is connected to node A B C with edge weight a b c. And Node J is connected to A B C with x y z weights, so the adj List of I will contains the Node object as
I -> <A, a>,<B b>,<C c>
List of J will contains the Node object as
J -> <A, x>,<B y>,<C z>

Efficiency of ArrayList

I am making a program in Java in which a ball bounces around on the screen. The user can add other balls, and they all bounce off of each other. My question lies in the storage of the added balls. At the moment, I am using an ArrayList to store them, and every time the space bar is pressed, a new ball class is created and added to an Array List. Is this the most efficient way of doing things? I don't specify the size of the Array List at the beginning, so is it inefficient to have to allocate a new space on the array every time the user wants a new ball, even if the ball count will get up in the hundreds? Is there another class I could use to handle this in a more efficient manner?
Thanks!
EDIT:
Sorry, I should have been more clear. I iterate through the balls every 30 milliseconds, using nested for loops to see if they are intersecting with each other. I do access one ball the most often (the ball which the user can control with the arrow keys, another feature of the game), but the user can choose to switch control balls. Balls are never removed. So, I am performing some fairly complex calculations (I use my own vector class to move them off of each other every time there is a collision) on the balls very often.
Measure it and find out! In all seriousness, often times the best way to get answers to these questions is to set up a benchmark and swap in different collection types.
I can tell you that it won't allocate new space every time you add a new item to the ArrayList. Extra space is allocated so that it has room to grow.
LinkedList is another List option. It is super cheap to add items, but random access (list.get(10)) is expensive. Sets could also be good if you don't need ordered access (though there are ordered sets, too), and you want a Map implementation if you're accessing them by some sort of key/id. It really all depends on how you're using the collection.
Update based on added details
It sounds like you are mostly doing sequential reads through the entire list. In that scenario, a LinkedList is probably your best choice. Though again, if you only expose the List interface to the rest of your code (or even a more general Collection), you can easily swap in different implementations and actually measure the difference.
ArrayList is a highly optimized and very efficient wrapper on top of a plain Java array. A small timing overhead comes from copying array elements, which happens when the allocated size is less than required number of elements. When you grow the array into a few hundreds of items, the copying will happen less than ten times, so the price you pay for not knowing the size in advance is very small. You can further reduce that overhead by suggesting an initial size for your ArrayList.
Removing from the middle of the ArrayList does take linear time. If you plan to remove items and/or insert them in the middle of the list frequently, this may become an issue. Note, however, that the overhead is not going to be worse than that for a plain array.
I iterate through the balls every 30 milliseconds, using nested for loops to see if they are intersecting with each other.
This does not have much to do with the collection in which the balls are stored. You could use a spatial index to improve the speed of finding intersections.
About ArrayList in Java, the complexity of remove at the end and add one element is Amortize O(1). Or, you can say, it's almost efficient in most cases. (In some rare cases, it will be awful.)
But you should think more carefully about your design before choosing your data structure.
How many objects often in your collection. If it's small, you can free to choose any data structure that you feel easily to work with. it will almost doesn't lost performance for your code.
If you often find one ball in all of your balls, another datastructure such as HashMap or HashSet would be better.
Or you often delete at middle of your list, maybe LinkedList will be appropriate choice :)
I'd recommend working out the way in which you need to access the balls, and pick an appropriate interface (Not implementation) eg. If you're accessing sequentially only, use a List. If you need to look up the ball by ID, think of a Map. The interface should match your requirements in terms of functionality, not in terms of speed/efficiency.
Then pick an implementation, eg. HashMap or TreeMap, and write your code.
Afterwards, profile it - Is your code inefficient in the ball access code? If so, then try to optimise by switching to an alternate implementation thats more appropriate to your needs.

Java data structure of 500 million (double) values?

I am generating random edges for a complete graph with 32678 Vertices. So, 500 million + values.
I am using a HashMap to using the edges as key and the random edge weight as the value. I keep encountering:
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at java.lang.StringBuilder.toString(StringBuilder.java:430) at
pa1.Graph.(Graph.java:60) at pa1.Main.main(Main.java:19)
This graph will then be used to construct a Minimum Spanning Tree.
Any ideas on a better data-structure or approach?
I know there are overrides to allocate more memory, but I would prefer a solution that works as-is.
A HashMap will be very large, cause it will contain Doubles (with a capital D) which are significantly larger than 8 bytes. (Not to mention the Entry) Depends on implementation and the CPU chip, but I think it's at least 16 bytes each, and probably more?
I think you should consider keeping the primary data in a huge double[] (or, if you can spare some accuracy, a float[]). That cuts memory usage by an easy 2x or 4x. (500M floats is a "mere" 2GB) Then use integer indexes into this array to implement your edges and vertices. For example, an edge could be an int[2]. This is far from O-O, and there's some serious hand-waving here. (and I don't understand all the nuances of what you are trying to do)
Very "old fashioned" in style, but requires a lot less memory.
Correction - I think an edge might be int[4], a vertex an int[2]. But you get the idea. Actually, for edges and vertices, you will have a smaller number of Objects and for them you can probably use "real" Objects, Maps, etc...
Since it is a complete graph, there is no doubt on what the edges are. How about storing the labels for those edges in a simple list which is ordered in a certain manner? So e.g. if you have 5 nodes, the weights for the edges which would be ordered as follows: {1,2}, {1,3} {1,4} {1,5} {2,3} {2,4} {2,5} {3,4} {3,5} {4,5}.
However, as pointed out by #BillyO'Neal this might still take up 8 GB of space. You might want to split up this list into multiple files and simultaneously maintain an index of these files suggesting where one set of weights ends in one file and where the next set of weights begin.
Additionally, given that you are finding the MST for the graph, you might want to have a look at the following paper as well: http://cvit.iiit.ac.in/papers/Vibhav09Fast.pdf. The paper seems to based off the Boruvka's Algorithm (http://en.wikipedia.org/wiki/Bor%C5%AFvka's_algorithm; http://iss.ices.utexas.edu/?p=projects/galois/benchmarks/mst).

Optimal data structures for a tile-based RPG In java

The game is tile-based, but the tiles are really only for terrain and path-finding purposes. Sprite movement is free-form (ie, the player can be half way through a tile).
The maps in this game are very large. At normal zoom tiles are 32*32 pixels, and maps sizes can be up 2000x2000 or larger (4 million tiles!). Currently, a map is an array of tiles, and the tile object looks like this:
public class Tile {
public byte groundType;
public byte featureType;
public ArrayList<Sprite> entities;
public Tile () {
groundType = -1;
featureType = -1;
entities = null;
}
}
Where groundType is the texture, and featureType is a map object that takes up an entire tile (such as a tree, or large rock). These types of features are quite common so I have opted to make them their own variable rather than store them in entities, which is a list of objects on the tile (items, creatures, etc). Entities are saved to a tile for performance reasons.
The problem I am having is that if entities is not initialized to null, Java runs out of heap space. But setting it to null and only initializing when something moves into the tile seems to me a bad solution. If a creature were moving across otherwise empty tiles, lists would constantly need to be initialized and set back to null. Is this not poor memory management? What would be a better solution?
Have a single structure (start with an ArrayList) containing all of
your sprites.
If you're running a game loop and cycling through the sprites list,
say, once very 30-50 seconds and there are up to, say, 200 sprites,
you shouldn't have a performance hit from this structure per se.
Later on, for other purposes such as collision detection, you may
well need to revise the structure of just a single ArrayList. I would suggest
starting with the simple, noddyish solution to get your game logic sorted out, then optimise as necessary.
For your tiles, if space is a concern, then rather than having a special "Tile" object, consider packing the
information for each tile into a single byte, short or int if not
actually much specific information per tile is required. Remember
that every Java object you create has some overhead (for the sake of
argument, let's say in the order of 24-32 bytes per object depending
on VM and 32 vs 64 bit processor). An array of 4 million bytes is
"only" 4MB, 4 million ints "only" 16MB.
Another solution for your tile data, if packing a tile's specification into a single primitive isn't practical, is to declare a large ByteBuffer, with each tile's data stored at index (say) tileNo * 16 if each tile needs 16 bytes of data.
You could consider not actually storing all of the tiles in memory. Whether this is appropriate will depend on your game. I would say that 2000x2000 is still within the realm that you could sensibly keep the whole data in memory if each individual tile does not need much data.
If you're thinking the last couple of points defeat the whole point of an object-oriented language, then yes you're right. So you need to weigh up at what point you opt for the "extreme" solution to save heap space, or whether you can "get away with" using more memory for the sake of a better programming paradigm. Having an object per tile might use (say) in the order of a few hundred megabytes. In some environments that will be ridiculous. In others where several gigabytes are available, it might be entirely reasonable.

resizing arrays when close to memory capacity

So I am implementing my own hashtable in java, since the built in hashtable has ridiculous memory overhead per entry. I'm making an open-addressed table with a variant of quadratic hashing, which is backed internally by two arrays, one for keys and one for values. I don't have the ability to resize though. The obvious way to do it is to create larger arrays and then hash all of the (key, value) pairs into the new arrays from the old ones. This falls apart though when my old arrays take up over 50% of my current memory, since I can't fit both the old and new arrays in memory at the same time. Is there any way to resize my hashtable in this situation
Edit: the info I got for current hashtable memory overheads is from here How much memory does a Hashtable use?
Also, for my current application, my values are ints, so rather than store references to Integers, I have an array of ints as my values.
The simple answer is "no, there is no way to extend the length of an existing array". That said, you could add extra complexity to your hashtable and use arrays-of-arrays (or specialize and hard-code support for just two arrays).
You could partition your hash table. e.g. you could have 2-16 partitions based on 1-4 bits in the hashCode. This would allow you to resize a portion of the hash table at a time.
If you have a single hash table which is a large percentage of your memory size, you have a serious design issue IMHO. Are you using a mobile device? What is your memory restriction? Have you looked using Trove4j which doesn't use entry objects either.
Maybe a solution for the problem is:
->Creating a list to store the content of the matrix (setting a list for each row and then free the memory of the array in question if possible, one by one);
-> Create the new matrix;
->Fill the matrix with the values stored on the list (removing 1 element from the list right after copying the info from it.
This can be easier if the matrix elements are pointers to the elements themselves.
This is a very theorical approach of the problem, but I hope it helps

Categories

Resources