Options for storing huge tile map

Options for storing huge tile map - java

I am creating a pseudo-turn-based online strategy browser game where many people play in the same world over a long period(months). For this, I want to have a map of 64000x64000 tiles = 4 billion tiles. I need about 6-10 bytes of data per tile, making up a total of around 30GB of data for storing the map.
Each tile should have properties such as type(water, grass, desert, mountain), resource(wood, cows, gold) and playerBuilt(road, building)
The client will only ever need access to about 100x100 tiles at the same time.
I have handled the map on client side under control. The problem that I'm faced with is how to store, retrieve and modify information from this map on the server side.
Required functionality:
Create, store, and modify 64000x64000 tilemap.
Show 100x100 part of the map to the client.
Make modifications on the map such as roads, buildings, and depleted resources.
What I have considered so far:
Procedural generation: Procedurally generating whichever part of the map is needed on the fly. Making sure that given the same seed, it always generates the same map. The main problem I have with this is that there will be modifications to the map during the game. Note: Less than 1% of the tiles would be modified during the game and it could be possible to store modifications with coordinates in an outside array. Loading them on top of the procedural generation.
Databases: Generating the map at the start of the game and storing it in a database. A friend advised me against this for such a huge tile map and told me that I'd probably want to store it in memory instead.
Keeping it all in memory on the server side: Keeping it in memory in a data structure. Seems like a nice way to do it if the map was smaller but for 4 billion tiles that would be a lot to keep in memory.
I was planning on using java+mysql for back-end for this project. I'm still in early phases and open to change technology if needed.
My question is: Which of the three approaches above seem viable and/or are there other ways to do it which I have not considered?

Depends on:
how much RAM you got (or player/users got)
Is most of the tile map empty (sparse) ? Opposite is dense.
Is there a default terrain (like empty or water ?)
If sparse, use a hashmap instead of a 2D array.
If dense it will be much more challenging and you may need to use a database or some special data structures + cache.
You may detect hot zones and keep them in memory for a while, dead zones (no players there, no activity ...) can be stored in the database and read on demand.
You may also load data in several passes: first just the terrain, then other objects... each layer could be stored in a different way. For example the terrain could be perlin noise generated + another layer which can be modified.

Related

Java storing multiple images, config files and such in an indexed file?

I have seen a few programs and games that store their data in an indexed file, and they load their data from that file which they usually call cache.
I want to be able to load my data in this way:
final int SPRITES_INDEX = 3;
List<Sprite> sprites = (List<Sprite>) cache.loadIndex(SPRITES_INDEX);
Does any know how it's done and why its done this way? or is there a name for this method of storing data?

You should look up "resources" in "jar" files. That's what is commonly used for this job in the java world. Normally a jar file is just a zip file, which is sequential, but many years ago they added the ability to have indexed jar files, which provide random access to their contents.
You can begin here: http://docs.oracle.com/javase/1.5.0/docs/tooldocs/windows/jar.html
(Look for the "i" option which adds an index to the jar file.)

At least for images/rendering, this is called Texture Packing, and is done because OpenGL "binds" images before rendering them, and this binding can be expensive, processing-wise.
Packaging the textures inside a larger image allows the game/app to bind only once, and then, based on an index of predefined pixel coordinates, render only parts of the larger image, as if they were separate smaller images.
I suggest taking a look at LibGDX's TexturePacker.
Extract:
In OpenGL, a texture is bound, some drawing is done, another texture
is bound, more drawing is done, etc. Binding the texture is relatively
expensive, so it is ideal to store many smaller images on a larger
image, bind the larger texture once, then draw portions of it many
times. libgdx has a TexturePacker class which is a command line
application that packs many smaller images on to larger images. It
stores the locations of the smaller images so they are easily
referenced by name in your application using the TextureAtlas class.
TexturePacker uses multiple packing algorithms but the most important
is based on the maximal rectangles algorithm. It also uses brute
force, packing with numerous heuristics at various sizes and then
choosing the most efficient result.
Note that this is a type of, and similar but not identical to, the general concept of Caching.
In computer programming, caching consists of dedicating a section of memory to storing recently or frequently used data, to avoid having to recreate/reprocess that data every time it is needed/accessed.
As such, it's similar, but not the same to the concept of texture-packing, which instead is done not to recreate/reprocess the images themselves, but rather to avoid an expensive complication/process further down the line.
Considering the gaming context of the question, it's also important to note that another concept, this time much closer to caching, exists. It's called Pooling, and consists of creating a cache (in this case called Pool) of pre-created/pre-processed instances of objects that can be expected to be needed in varied quantity over time, for examples the units in an RTS game, to avoid having to create them when they are needed (in turn to avoid sudden "weight" in processing, which leads to sudden drops in FPS, or "stutters", in the context of a game).

Java procedural 2d terrain low-poly style

I'am working on a game with an low poly style. I have been searching for precedural terrain generation but there where only 3d or tile based tutorials.
INFO:
Langue is Java using the libGDX framework and released on android.
The terrain will be generated procedural while the game is running using a chunk loading system (for an infinite world).
The game terrain will be saved. And should be reloaded with the same terrain.
The Terrain can be convex (caves).
QUESTION:
Are there any good tutorials or libs?
If I use chunks to only load parts of the map some triangles vertices will contain 2 differen chunks how to manage these?
I have read that I shouldn't save / load a chunk to a file. But just generate the terrain using a seed. How do I tell the generator to not generate something that was removed previously?
What about entities save them to a file?

A few bits of general advice I have
Possible vertex overlap could be accounted for by defining and saving chunks with padding to account for the maximum a vertex could go outside of its chunk. Minecraft, for example, never had this problem because cubes line up very nicely. You could consider changing the geometry you're using. For example: define the world as cubes, and then apply an effect to move all vertices pseudo-randomly and thus hide that you're generating using cubes.
I would generate all terrain using a seed instead of saving and
loading from a file except for in chunks where something has been
removed. Those chunks will need to be saved. You could overwrite seed chunks with these.
Just like you said in your question, handle entities by: saving them to a .properties file or something. I would using a LinkedList<> or array[] with an abstract parent class to keep track of them in-game.
Some videos about procedural generation in general:
https://www.youtube.com/watch?v=JdYkcrW8FBg
https://www.youtube.com/watch?v=0ORcSjvESrA
This is all pretty abstract information, but I didn't want to leave this without any response. Hopefully someone with more experience in the particular area can give you more practical insight.

Data structure to use for platformer game map

I'm writing a 2D platformer game using Java for Android.
Currently I have all game entities (except for the avatar, stored in an array of array of "Entity"
Entity[][]
Whenever I need to check something - such as what I'm going to draw on screen or for collision detection - I simply grab a small radius of items around the avatar and do whatever using a system of inheritance and polymorphism.
The problem is that this means I can only put one entity in a particular grid coordinate. This used to be quite okay for the most part - but now I have moving items (such as enemies or moving blocks) - which when they collide, end up deleting one another basically they get overwritten.
So what data structure should I use? I was thinking of something like
ArrayList<Entity>[][]
But that's going to be very expensive, and a waste of memory since duplicate items are the exception not the rule.
I was also considering separating the moving items into their own ArrayList, and looping through all of them, but that's an ugly solution.
So any ideas on what I could use? I want something which is pretty fast, but not too memory intensive.

A two-dimensional array for storing entities wastes memory (most blocks won't be occupied) and doesn't allow overlapping entities, as you noticed. With few entities, a simple ArrayList (no visible arrays involved) is probably going to be fully adequate.
If you have lots of entities, you could consider a quad-tree data structure or some other spatial indexing structure (check http://en.wikipedia.org/wiki/Spatial_index).
As a simpler and more "traditional" approach, how about keeping a two-layer view of your entities-- an ArrayList of all active (visible) ones, and an ArrayList of all inactive (too-far-away-to-care) ones? If performance matters, you can disable collision detection on the latter and perform little or no AI/maintenance on them.

What about you keep the entities as:
Entity[][]
And assuming you look-up entities, like:
Entity e = entities[i][j]
But keeping you enemies in a:
Map<String,List<Entity>> map
Where the key is the String made by concatenating i and j with a random middle character (e.g. "1.15" if i==1 and j==15). Now you keep the fast look-up of entities and also have fast look-up (albeit not as fast as an array) of enemies without the excess empty space.

Optimal data structures for a tile-based RPG In java

The game is tile-based, but the tiles are really only for terrain and path-finding purposes. Sprite movement is free-form (ie, the player can be half way through a tile).
The maps in this game are very large. At normal zoom tiles are 32*32 pixels, and maps sizes can be up 2000x2000 or larger (4 million tiles!). Currently, a map is an array of tiles, and the tile object looks like this:
public class Tile {
public byte groundType;
public byte featureType;
public ArrayList<Sprite> entities;
public Tile () {
groundType = -1;
featureType = -1;
entities = null;
}
}
Where groundType is the texture, and featureType is a map object that takes up an entire tile (such as a tree, or large rock). These types of features are quite common so I have opted to make them their own variable rather than store them in entities, which is a list of objects on the tile (items, creatures, etc). Entities are saved to a tile for performance reasons.
The problem I am having is that if entities is not initialized to null, Java runs out of heap space. But setting it to null and only initializing when something moves into the tile seems to me a bad solution. If a creature were moving across otherwise empty tiles, lists would constantly need to be initialized and set back to null. Is this not poor memory management? What would be a better solution?

Have a single structure (start with an ArrayList) containing all of
your sprites.
If you're running a game loop and cycling through the sprites list,
say, once very 30-50 seconds and there are up to, say, 200 sprites,
you shouldn't have a performance hit from this structure per se.
Later on, for other purposes such as collision detection, you may
well need to revise the structure of just a single ArrayList. I would suggest
starting with the simple, noddyish solution to get your game logic sorted out, then optimise as necessary.
For your tiles, if space is a concern, then rather than having a special "Tile" object, consider packing the
information for each tile into a single byte, short or int if not
actually much specific information per tile is required. Remember
that every Java object you create has some overhead (for the sake of
argument, let's say in the order of 24-32 bytes per object depending
on VM and 32 vs 64 bit processor). An array of 4 million bytes is
"only" 4MB, 4 million ints "only" 16MB.
Another solution for your tile data, if packing a tile's specification into a single primitive isn't practical, is to declare a large ByteBuffer, with each tile's data stored at index (say) tileNo * 16 if each tile needs 16 bytes of data.
You could consider not actually storing all of the tiles in memory. Whether this is appropriate will depend on your game. I would say that 2000x2000 is still within the realm that you could sensibly keep the whole data in memory if each individual tile does not need much data.
If you're thinking the last couple of points defeat the whole point of an object-oriented language, then yes you're right. So you need to weigh up at what point you opt for the "extreme" solution to save heap space, or whether you can "get away with" using more memory for the sake of a better programming paradigm. Having an object per tile might use (say) in the order of a few hundred megabytes. In some environments that will be ridiculous. In others where several gigabytes are available, it might be entirely reasonable.

Techniques for keeping data in the cache, locality?

For ultra-fast code it essential that we keep locality of reference- keep as much of the data which is closely used together, in CPU cache:
http://en.wikipedia.org/wiki/Locality_of_reference
What techniques are to achieve this? Could people give examples?
I interested in Java and C/C++ examples. Interesting to know of ways people use to stop lots of cache swapping.
Greetings

This is probably too generic to have clear answer. The approaches in C or C++ compared to Java will differ quite a bit (the way the language lays out objects differ).
The basic would be, keep data that will be access in close loops together. If your loop operates on type T, and it has members m1...mN, but only m1...m4 are used in the critical path, consider breaking T into T1 that contains m1...m4 and T2 that contains m4...mN. You might want to add to T1 a pointer that refers to T2. Try to avoid objects that are unaligned with respect to cache boundaries (very platform dependent).
Use contiguous containers (plain old array in C, vector in C++) and try to manage the iterations to go up or down, but not randomly jumping all over the container. Linked Lists are killers for locality, two consecutive nodes in a list might be at completely different random locations.
Object containers (and generics) in Java are also a killer, while in a Vector the references are contiguous, the actual objects are not (there is an extra level of indirection). In Java there are a lot of extra variables (if you new two objects one right after the other, the objects will probably end up being in almost contiguous memory locations, even though there will be some extra information (usually two or three pointers) of Object management data in between. GC will move objects around, but hopefully won't make things much worse than it was before it run.
If you are focusing in Java, create compact data structures, if you have an object that has a position, and that is to be accessed in a tight loop, consider holding an x and y primitive types inside your object rather than creating a Point and holding a reference to it. Reference types need to be newed, and that means a different allocation, an extra indirection and less locality.

Two common techniques include:
Minimalism (of data size and/or code size/paths)
Use cache oblivious techniques
Example for minimalism: In ray tracing (a 3d graphics rendering paradigm), it is a common approach to use 8 byte Kd-trees to store static scene data. The traversal algorithm fits in just a few lines of code. Then, the Kd-tree is often compiled in a manner that minimalizes the number of traversal steps by having large, empty nodes at the top of tree ("Surface Area Heuristics" by Havran).
Mispredictions typically have a probability of 50%, but are of minor costs, because really many nodes fit in a cache-line (consider that you get 128 nodes per KiB!), and one of the two child nodes is always a direct neighbour in memory.
Example for cache oblivious techniques: Morton array indexing, also known as Z-order-curve-indexing. This kind of indexing might be preferred if you usually access nearby array elements in unpredictable direction. This might be valuable for large image or voxel data where you might have 32 or even 64 bytes big pixels, and then millions of them (typical compact camera measure is Megapixels, right?) or even thousands of billions for scientific simulations.
However, both techniques have one thing in common: Keep most frequently accessed stuff nearby, the less frequently things can be further away, spanning the whole range of L1 cache over main memory to harddisk, then other computers in the same room, next room, same country, worldwide, other planets.

Some random tricks that come to my mind, and which some of them I used recently:
Rethink your algorithm. For example, you have an image with a shape and the processing algorithm that looks for corners of the shape. Instead of operating on the image data directly, you can preprocess it, save all the shape's pixel coordinates in a list and then operate on the list. You avoid random the jumping around the image
Shrink data types. Regular int will take 4 bytes, and if you manage to use e.g. uint16_t you will cache 2x more stuff
Sometimes you can use bitmaps, I used it for processing a binary image. I stored pixel per bit, so I could fit 8*32 pixels in a single cache line. It really boosted the performance
Form Java, you can use JNI (it's not difficult) and implement your critical code in C to control the memory

In the Java world the JIT is going to be working hard to achieve this, and trying to second guess this is likely to be counterproductive. This SO question addresses Java-specific issues more fully.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.