Optimal data structures for a tile-based RPG In java

Optimal data structures for a tile-based RPG In java - java

The game is tile-based, but the tiles are really only for terrain and path-finding purposes. Sprite movement is free-form (ie, the player can be half way through a tile).
The maps in this game are very large. At normal zoom tiles are 32*32 pixels, and maps sizes can be up 2000x2000 or larger (4 million tiles!). Currently, a map is an array of tiles, and the tile object looks like this:
public class Tile {
public byte groundType;
public byte featureType;
public ArrayList<Sprite> entities;
public Tile () {
groundType = -1;
featureType = -1;
entities = null;
}
}
Where groundType is the texture, and featureType is a map object that takes up an entire tile (such as a tree, or large rock). These types of features are quite common so I have opted to make them their own variable rather than store them in entities, which is a list of objects on the tile (items, creatures, etc). Entities are saved to a tile for performance reasons.
The problem I am having is that if entities is not initialized to null, Java runs out of heap space. But setting it to null and only initializing when something moves into the tile seems to me a bad solution. If a creature were moving across otherwise empty tiles, lists would constantly need to be initialized and set back to null. Is this not poor memory management? What would be a better solution?

Have a single structure (start with an ArrayList) containing all of
your sprites.
If you're running a game loop and cycling through the sprites list,
say, once very 30-50 seconds and there are up to, say, 200 sprites,
you shouldn't have a performance hit from this structure per se.
Later on, for other purposes such as collision detection, you may
well need to revise the structure of just a single ArrayList. I would suggest
starting with the simple, noddyish solution to get your game logic sorted out, then optimise as necessary.
For your tiles, if space is a concern, then rather than having a special "Tile" object, consider packing the
information for each tile into a single byte, short or int if not
actually much specific information per tile is required. Remember
that every Java object you create has some overhead (for the sake of
argument, let's say in the order of 24-32 bytes per object depending
on VM and 32 vs 64 bit processor). An array of 4 million bytes is
"only" 4MB, 4 million ints "only" 16MB.
Another solution for your tile data, if packing a tile's specification into a single primitive isn't practical, is to declare a large ByteBuffer, with each tile's data stored at index (say) tileNo * 16 if each tile needs 16 bytes of data.
You could consider not actually storing all of the tiles in memory. Whether this is appropriate will depend on your game. I would say that 2000x2000 is still within the realm that you could sensibly keep the whole data in memory if each individual tile does not need much data.
If you're thinking the last couple of points defeat the whole point of an object-oriented language, then yes you're right. So you need to weigh up at what point you opt for the "extreme" solution to save heap space, or whether you can "get away with" using more memory for the sake of a better programming paradigm. Having an object per tile might use (say) in the order of a few hundred megabytes. In some environments that will be ridiculous. In others where several gigabytes are available, it might be entirely reasonable.

Related

optimizing a grid-based particle system

I've implemented a game somewhat similar to this one in Java and currently find that I'm hitting a ceiling number of particles of ~80k. My game board is a 2D array of references to 'Particle' objects, each of which must be updated every frame. Different kinds of 'Particle' have different behaviors and may move or change their state in response to environmental conditions such as wind or adjacent particles.
Some possible 'rules' that might be in effect:
If a Particle of type lava is adjacent to a Particle of type water, both disappear, and the lava is replaced by obsidian
If a gas Particle is adjacent to a Lava, Fire, Ember, etc. Particle, it will ignite, and produce fire and smoke
If a sufficient number of dust particles are stacked on top of one another, those at lower levels, as if under pressure, can become sedimentary rock
I've searched around and haven't been able to find any algorithms or data structures that seem particularly well-suited to speeding up the task. It seems that some kind of memoization might be useful? Would a quad tree be of any use here? I've seen them used in the somewhat similar Conway's Game of Life with the Hashlife algorithm. Or, is it the case that I'm not going to be able to do too much to increase the speed?

Hashlife will work in principle but there are two reasons why you might not get as much out of it as Conway Life.
Firstly it relies on recurring patterns. The more cell states you have and the less structured the plane the fewer cache hits you'll encounter and the more you'll be working with brute force.
Secondly as another poster noted rules that involve non-local effects will either mean your primitives (in Conway Life 4x4) will need to be bigger so you will have abandon divide and conquer at say 8x8 or 16x16 or whatever size guarantees you can correctly calculate the middle portion in n/2 time.
That's made the worse by the diversity of states. In Conway Life it's common to pre-calculate all 4x4 gridsor at least have nearly all relevant ones in cache.
With 2 states there are only 65536 4x4 grids (peanuts on modern platforms) but with only 3 there are 43046721.
If you have to have 8x8 primitives it gets very big very quickly and beyond any realistic storage.
So the larger the primitive and the more states you have that becomes quickly unrealistic.
One way to address that primitive size is to have the rock rule propagate pressure. So a Rock+n (n representing pressure) becomes Rock+(n+1) in the next generation if it has Rock+m where m>=n above it. Up to some threshold k where it turns to sedimentary Rock.
That means cells are still only dependent on their immediate neighbours but again multiplies up the number of states.
If you have cell types like the 'Bird' in the example given and you have velocities that you don't keep to a minimum (say -1,0,1 in either direction) you'll totally collapse memoization. Even then the chaotic nature of such rules may make cache hits on those areas vanishingly small.
If your rules don't lead to steady states (or repeating cycles) like Conway Life often does the return on memoization will be limited unless your plane is mostly empty.

i don't understand your problem clearly but I think cuda or OpenGL (GPU programming in general) can easily handle your ref link: https://dan-ball.jp/en/javagame/dust/

I'd use a fixed NxN grid for this mainly because there are too many points moving around each frame to benefit from the recursive subdividing nature of the quad-tree. This is a case where a straightforward data structure with the data representations and memory layouts tuned appropriately can make all the difference in the world.
The main thing I'd do for Java here is actually avoid modeling each particle as an object. It should be raw data like just some plain old data like floats or ints. You want to be able to work with contiguity guarantees for spatial locality with sequential processing and not pay for the cost of padding and the 8-byte overhead per class instance. Split cold fields away from hot fields.
For example, you don't necessarily need to know a particle's color to move it around and apply physics. As a result, you don't want an AoS representation here which has to load in a particle's color into cache lines during the sequential physics pass only to evict it and not use it. Cram as much relevant memory used together into a cache line as you can by separating it away from the irrelevant memory for a particular pass.
Each cell in the grid should just store an index into a particle, with each particle storing an index to the next particle in the cell (a singly-linked list, but an intrusive one which requires allocating no nodes and just uses indices into arrays). A -1 can be used to indicate the end of the list as well as empty cells.
To find collisions between particles of interest, look in the same cell as the particle you're testing, and you can do this in parallel where each thread handles one or more cells worth of particles.
The NxN grid should be very fine given the boatload of moving particles you can have per frame. Play with how many cells you create to find something optimal for your input sizes. You might even have multiple grids. If certain particles don't interact with each other, don't put them in the same grid. Don't worry about the memory usage of the grid here. If each grid cell just stores a 32-bit index to the first particle in the cell, then a 200x200 grid only takes 160 kilobytes with a 32-bit next index overhead per particle.
I made something similar to this some years back in C using the technique above (but not with as many interesting particle interactions as the demo game) which could handle about 10 mil particles before it started to go below 30 FPS and on older hardware with only 2 cores. It did use C as well as SIMD and multithreading, but I think you can get a very speedy solution in Java handling a boatload of particles at once if you do the above.
Data structure:
As particles move from one cell to the next, all you do is manipulate a couple of integers to move them from one cell to the other. Cells don't "own memory" or allocate any. They're just 32-bit indices.
To figure out which cell a particle occupies, just do:
cell_x = (int)(particle_x[particle_index] / cell_size)
cell_y = (int)(particle_y[particle_index] / cell_size)
cell_index = cell_y * num_cols + cell_x
... much cheaper constant-time operation than traversing a tree structure and having to rebalance it as particles move around.

Options for storing huge tile map

I am creating a pseudo-turn-based online strategy browser game where many people play in the same world over a long period(months). For this, I want to have a map of 64000x64000 tiles = 4 billion tiles. I need about 6-10 bytes of data per tile, making up a total of around 30GB of data for storing the map.
Each tile should have properties such as type(water, grass, desert, mountain), resource(wood, cows, gold) and playerBuilt(road, building)
The client will only ever need access to about 100x100 tiles at the same time.
I have handled the map on client side under control. The problem that I'm faced with is how to store, retrieve and modify information from this map on the server side.
Required functionality:
Create, store, and modify 64000x64000 tilemap.
Show 100x100 part of the map to the client.
Make modifications on the map such as roads, buildings, and depleted resources.
What I have considered so far:
Procedural generation: Procedurally generating whichever part of the map is needed on the fly. Making sure that given the same seed, it always generates the same map. The main problem I have with this is that there will be modifications to the map during the game. Note: Less than 1% of the tiles would be modified during the game and it could be possible to store modifications with coordinates in an outside array. Loading them on top of the procedural generation.
Databases: Generating the map at the start of the game and storing it in a database. A friend advised me against this for such a huge tile map and told me that I'd probably want to store it in memory instead.
Keeping it all in memory on the server side: Keeping it in memory in a data structure. Seems like a nice way to do it if the map was smaller but for 4 billion tiles that would be a lot to keep in memory.
I was planning on using java+mysql for back-end for this project. I'm still in early phases and open to change technology if needed.
My question is: Which of the three approaches above seem viable and/or are there other ways to do it which I have not considered?

Depends on:
how much RAM you got (or player/users got)
Is most of the tile map empty (sparse) ? Opposite is dense.
Is there a default terrain (like empty or water ?)
If sparse, use a hashmap instead of a 2D array.
If dense it will be much more challenging and you may need to use a database or some special data structures + cache.
You may detect hot zones and keep them in memory for a while, dead zones (no players there, no activity ...) can be stored in the database and read on demand.
You may also load data in several passes: first just the terrain, then other objects... each layer could be stored in a different way. For example the terrain could be perlin noise generated + another layer which can be modified.

Data structure to use for platformer game map

I'm writing a 2D platformer game using Java for Android.
Currently I have all game entities (except for the avatar, stored in an array of array of "Entity"
Entity[][]
Whenever I need to check something - such as what I'm going to draw on screen or for collision detection - I simply grab a small radius of items around the avatar and do whatever using a system of inheritance and polymorphism.
The problem is that this means I can only put one entity in a particular grid coordinate. This used to be quite okay for the most part - but now I have moving items (such as enemies or moving blocks) - which when they collide, end up deleting one another basically they get overwritten.
So what data structure should I use? I was thinking of something like
ArrayList<Entity>[][]
But that's going to be very expensive, and a waste of memory since duplicate items are the exception not the rule.
I was also considering separating the moving items into their own ArrayList, and looping through all of them, but that's an ugly solution.
So any ideas on what I could use? I want something which is pretty fast, but not too memory intensive.

A two-dimensional array for storing entities wastes memory (most blocks won't be occupied) and doesn't allow overlapping entities, as you noticed. With few entities, a simple ArrayList (no visible arrays involved) is probably going to be fully adequate.
If you have lots of entities, you could consider a quad-tree data structure or some other spatial indexing structure (check http://en.wikipedia.org/wiki/Spatial_index).
As a simpler and more "traditional" approach, how about keeping a two-layer view of your entities-- an ArrayList of all active (visible) ones, and an ArrayList of all inactive (too-far-away-to-care) ones? If performance matters, you can disable collision detection on the latter and perform little or no AI/maintenance on them.

What about you keep the entities as:
Entity[][]
And assuming you look-up entities, like:
Entity e = entities[i][j]
But keeping you enemies in a:
Map<String,List<Entity>> map
Where the key is the String made by concatenating i and j with a random middle character (e.g. "1.15" if i==1 and j==15). Now you keep the fast look-up of entities and also have fast look-up (albeit not as fast as an array) of enemies without the excess empty space.

Techniques for keeping data in the cache, locality?

For ultra-fast code it essential that we keep locality of reference- keep as much of the data which is closely used together, in CPU cache:
http://en.wikipedia.org/wiki/Locality_of_reference
What techniques are to achieve this? Could people give examples?
I interested in Java and C/C++ examples. Interesting to know of ways people use to stop lots of cache swapping.
Greetings

This is probably too generic to have clear answer. The approaches in C or C++ compared to Java will differ quite a bit (the way the language lays out objects differ).
The basic would be, keep data that will be access in close loops together. If your loop operates on type T, and it has members m1...mN, but only m1...m4 are used in the critical path, consider breaking T into T1 that contains m1...m4 and T2 that contains m4...mN. You might want to add to T1 a pointer that refers to T2. Try to avoid objects that are unaligned with respect to cache boundaries (very platform dependent).
Use contiguous containers (plain old array in C, vector in C++) and try to manage the iterations to go up or down, but not randomly jumping all over the container. Linked Lists are killers for locality, two consecutive nodes in a list might be at completely different random locations.
Object containers (and generics) in Java are also a killer, while in a Vector the references are contiguous, the actual objects are not (there is an extra level of indirection). In Java there are a lot of extra variables (if you new two objects one right after the other, the objects will probably end up being in almost contiguous memory locations, even though there will be some extra information (usually two or three pointers) of Object management data in between. GC will move objects around, but hopefully won't make things much worse than it was before it run.
If you are focusing in Java, create compact data structures, if you have an object that has a position, and that is to be accessed in a tight loop, consider holding an x and y primitive types inside your object rather than creating a Point and holding a reference to it. Reference types need to be newed, and that means a different allocation, an extra indirection and less locality.

Two common techniques include:
Minimalism (of data size and/or code size/paths)
Use cache oblivious techniques
Example for minimalism: In ray tracing (a 3d graphics rendering paradigm), it is a common approach to use 8 byte Kd-trees to store static scene data. The traversal algorithm fits in just a few lines of code. Then, the Kd-tree is often compiled in a manner that minimalizes the number of traversal steps by having large, empty nodes at the top of tree ("Surface Area Heuristics" by Havran).
Mispredictions typically have a probability of 50%, but are of minor costs, because really many nodes fit in a cache-line (consider that you get 128 nodes per KiB!), and one of the two child nodes is always a direct neighbour in memory.
Example for cache oblivious techniques: Morton array indexing, also known as Z-order-curve-indexing. This kind of indexing might be preferred if you usually access nearby array elements in unpredictable direction. This might be valuable for large image or voxel data where you might have 32 or even 64 bytes big pixels, and then millions of them (typical compact camera measure is Megapixels, right?) or even thousands of billions for scientific simulations.
However, both techniques have one thing in common: Keep most frequently accessed stuff nearby, the less frequently things can be further away, spanning the whole range of L1 cache over main memory to harddisk, then other computers in the same room, next room, same country, worldwide, other planets.

Some random tricks that come to my mind, and which some of them I used recently:
Rethink your algorithm. For example, you have an image with a shape and the processing algorithm that looks for corners of the shape. Instead of operating on the image data directly, you can preprocess it, save all the shape's pixel coordinates in a list and then operate on the list. You avoid random the jumping around the image
Shrink data types. Regular int will take 4 bytes, and if you manage to use e.g. uint16_t you will cache 2x more stuff
Sometimes you can use bitmaps, I used it for processing a binary image. I stored pixel per bit, so I could fit 8*32 pixels in a single cache line. It really boosted the performance
Form Java, you can use JNI (it's not difficult) and implement your critical code in C to control the memory

In the Java world the JIT is going to be working hard to achieve this, and trying to second guess this is likely to be counterproductive. This SO question addresses Java-specific issues more fully.

Programming a 2D grid in Java

What is the best data structure to use when programming a 2-dimensional grid of tiles in Java? Tiles on the grid should be easily referenced by their location, so that neighbors and paths can be efficiently computed. Should it be a 2D array? An ArrayList? Something else?

If you're not worrying about speed or memory too much, you can simply use a 2D array - this should work well enough.
If speed and/or memory are issues for you then this depends on memory usage and the access pattern.
A single dimensional array is the way to go if you need high performance. You compute the proper index as y * wdt + x. There are 2 potential problems with this: cache misses and memory usage.
If you know that your access pattern is such that you fetch neighbours of an element most of the time, then mapping a 2D space into a 1D array as described above may cause cache misses - you want the neighbours to be close in memory, and neighbours from 2 different rows are not. You may have to map your 2d tiles in a different order to your 1d array. See Hilbert curves for example.
For better memory usage, if you know that most of your tiles are always the same (e.g. always grass), you might want to implement a sparse array or a quad tree. Both can be implemented quite efficiently, with cache awareness in mind (the sparse array link is good example for this). Another benefit is that these can be dynamically extended. However, you will always have to pay extra levels of indirection in the end for this to work.
NOTE: Be careful with using generic classes such as HashMaps with the key type being some primitive type or a special location class if you're worried about performance - you will either have to allocate an object each time you index the hash map or pay the price of boxing/unboxing. In addition to this, hash maps will not allow you efficient spatial queries (e.g. give me all objects existing in the radius R of a given object - quad trees are better for this).

If you have a fixed dimension for your grid, use a 2D array. If you need the size to be dynamic, use an ArrayList of ArrayLists.

A 2D array seems like a good bet if you plan on inserting stuff into specific locations. As long as its a fixed Size.

The data structure to use really depends on the type of operations you will perform:
In case the number of meaningful positions (nonzero/nondefault) in the grid is rather low (<< n x m) it might be more space efficient to use a hashmap, that maps (x,y) positions to specific tiles. Also you can iterate over meaningful positions alot more efficiently. In addition you could store references to neighboring tiles to each tile to speed up path/neighborhood traversal.
If your grid is densely filled with "information" you should consider using a 2d array or ArrayList (in case you will at some point have generic types involved as "tile-type", you have to use ArrayLists, since Java does not allow native arrays of generic type).

If you simply need to iterate over the grid and random addressing of cells, then MyCellType[][] should be fine. This is most efficient in terms of space and (one would expect) time for these use-cases.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.