optimizing a grid-based particle system - java

I've implemented a game somewhat similar to this one in Java and currently find that I'm hitting a ceiling number of particles of ~80k. My game board is a 2D array of references to 'Particle' objects, each of which must be updated every frame. Different kinds of 'Particle' have different behaviors and may move or change their state in response to environmental conditions such as wind or adjacent particles.
Some possible 'rules' that might be in effect:
If a Particle of type lava is adjacent to a Particle of type water, both disappear, and the lava is replaced by obsidian
If a gas Particle is adjacent to a Lava, Fire, Ember, etc. Particle, it will ignite, and produce fire and smoke
If a sufficient number of dust particles are stacked on top of one another, those at lower levels, as if under pressure, can become sedimentary rock
I've searched around and haven't been able to find any algorithms or data structures that seem particularly well-suited to speeding up the task. It seems that some kind of memoization might be useful? Would a quad tree be of any use here? I've seen them used in the somewhat similar Conway's Game of Life with the Hashlife algorithm. Or, is it the case that I'm not going to be able to do too much to increase the speed?

Hashlife will work in principle but there are two reasons why you might not get as much out of it as Conway Life.
Firstly it relies on recurring patterns. The more cell states you have and the less structured the plane the fewer cache hits you'll encounter and the more you'll be working with brute force.
Secondly as another poster noted rules that involve non-local effects will either mean your primitives (in Conway Life 4x4) will need to be bigger so you will have abandon divide and conquer at say 8x8 or 16x16 or whatever size guarantees you can correctly calculate the middle portion in n/2 time.
That's made the worse by the diversity of states. In Conway Life it's common to pre-calculate all 4x4 gridsor at least have nearly all relevant ones in cache.
With 2 states there are only 65536 4x4 grids (peanuts on modern platforms) but with only 3 there are 43046721.
If you have to have 8x8 primitives it gets very big very quickly and beyond any realistic storage.
So the larger the primitive and the more states you have that becomes quickly unrealistic.
One way to address that primitive size is to have the rock rule propagate pressure. So a Rock+n (n representing pressure) becomes Rock+(n+1) in the next generation if it has Rock+m where m>=n above it. Up to some threshold k where it turns to sedimentary Rock.
That means cells are still only dependent on their immediate neighbours but again multiplies up the number of states.
If you have cell types like the 'Bird' in the example given and you have velocities that you don't keep to a minimum (say -1,0,1 in either direction) you'll totally collapse memoization. Even then the chaotic nature of such rules may make cache hits on those areas vanishingly small.
If your rules don't lead to steady states (or repeating cycles) like Conway Life often does the return on memoization will be limited unless your plane is mostly empty.

i don't understand your problem clearly but I think cuda or OpenGL (GPU programming in general) can easily handle your ref link: https://dan-ball.jp/en/javagame/dust/

I'd use a fixed NxN grid for this mainly because there are too many points moving around each frame to benefit from the recursive subdividing nature of the quad-tree. This is a case where a straightforward data structure with the data representations and memory layouts tuned appropriately can make all the difference in the world.
The main thing I'd do for Java here is actually avoid modeling each particle as an object. It should be raw data like just some plain old data like floats or ints. You want to be able to work with contiguity guarantees for spatial locality with sequential processing and not pay for the cost of padding and the 8-byte overhead per class instance. Split cold fields away from hot fields.
For example, you don't necessarily need to know a particle's color to move it around and apply physics. As a result, you don't want an AoS representation here which has to load in a particle's color into cache lines during the sequential physics pass only to evict it and not use it. Cram as much relevant memory used together into a cache line as you can by separating it away from the irrelevant memory for a particular pass.
Each cell in the grid should just store an index into a particle, with each particle storing an index to the next particle in the cell (a singly-linked list, but an intrusive one which requires allocating no nodes and just uses indices into arrays). A -1 can be used to indicate the end of the list as well as empty cells.
To find collisions between particles of interest, look in the same cell as the particle you're testing, and you can do this in parallel where each thread handles one or more cells worth of particles.
The NxN grid should be very fine given the boatload of moving particles you can have per frame. Play with how many cells you create to find something optimal for your input sizes. You might even have multiple grids. If certain particles don't interact with each other, don't put them in the same grid. Don't worry about the memory usage of the grid here. If each grid cell just stores a 32-bit index to the first particle in the cell, then a 200x200 grid only takes 160 kilobytes with a 32-bit next index overhead per particle.
I made something similar to this some years back in C using the technique above (but not with as many interesting particle interactions as the demo game) which could handle about 10 mil particles before it started to go below 30 FPS and on older hardware with only 2 cores. It did use C as well as SIMD and multithreading, but I think you can get a very speedy solution in Java handling a boatload of particles at once if you do the above.
Data structure:
As particles move from one cell to the next, all you do is manipulate a couple of integers to move them from one cell to the other. Cells don't "own memory" or allocate any. They're just 32-bit indices.
To figure out which cell a particle occupies, just do:
cell_x = (int)(particle_x[particle_index] / cell_size)
cell_y = (int)(particle_y[particle_index] / cell_size)
cell_index = cell_y * num_cols + cell_x
... much cheaper constant-time operation than traversing a tree structure and having to rebalance it as particles move around.

Related

Efficiently gathering data from a game board

Say I have a connect-4 board, it's a 7x6 board, and I want to store what piece is being stored in what spot on that board. Using a 2-array would be nice, on the fact that I can quickly visualize it as a board, but I worry about the efficiency of looping through an array to gather data so often.
What would be the most efficient way of 1) Storing that game board and 2) Gathering the data from the said game board?
Thanks.
The trite answer is that at 7x6, it's not going to be a big deal: Unless you're on a microcontroller this might not make a practical difference. However, if you're thinking about this from an algorithm perspective, you can reason about the operations; "storing" and "gathering" are not quite specific enough. You'll need to think through exactly which operations you're trying to support, and how they would scale if you had thousands of columns and millions of pieces. Operations might be:
Read whether a piece exist, and what color it is, given its x and y coordinates.
When you add a piece to a column, it will "fall". Given a column, how far does it fall, or what would the new y value be for a piece added to column x? At most this will be height times whatever the cost of reading is, since you could just scan the column.
Add a piece at the given x and y coordinate.
Scan through all pieces, which is at most width times height times the cost of reading.
Of course, all of this has to fit on your computer as well, so you care about storage space as well as time.
Let's list out some options:
Array, such as game[x][y] or game[n] where n is something like x * height + y: Constant time (O(1)) to read/write given x and y, but O(width * height) to scan and count, and O(height) time to figure out how far a piece drops. Constant space of O(width * height). Perfectly reasonable for 7x6, might be a bad idea if you had a huge grid (e.g. 7 million x 6 million).
Array such as game[n] where each piece is added to the board and each piece contains its x and y coordinate: O(pieces) time to find/add/delete a piece given x and y, O(pieces) scan time, O(pieces) space. Probably good for an extremely sparse grid (e.g. 7 million x 6 million), but needlessly slow for 7x6.
HashMap as Grant suggests, where the key is a Point data object you write that contains x and y. O(1) to read/write, O(height) to see how far a piece drops, O(pieces) time to scan, O(pieces) space. Slightly better than an array, because you don't need an empty array slot per blank space on the board. There's a little extra memory per piece entry for the HashMap key object, but you could effectively make an enormous board with very little extra cost, which makes this slightly better than option 1 if you don't mind writing the extra Point class.
An array of resizable column arrays, e.g. List. This is similar to an array of fixed arrays, but because List stores its size and can allocate only as much memory as needed, you can store the state very efficiently including how far a piece needs to fall. Constant read/write/add, constant "fall" time, O(pieces) + O(width) scan time, O(pieces) + O(width) space because you don't need to scan/store the cells you know are empty.
Given those options, I think that an array of Lists (#4) is the most scalable solution, but unless I knew it needed to scale I would probably choose the array of arrays (#1) for ease of writing and understanding.
I may be wrong, but I think you're looking a hashmap (a form of hashtable) if you want efficiency.
Here's the documentation:
https://docs.oracle.com/javase/8/docs/api/java/util/Hashtable.html
HashMap provides expected constant-time performance O(1) for most operations like add(), remove() and contains().
Since you're using a 7x6 board, you can simply name your keys and values A1 ... A6 for example.

Is there a way to use limitless lists in java?

I'm trying to make a randomly-generated 2-d game, which I plan to do with a list of terrain to the right of the spawn point and a list of terrain to the left of the spawn point. However, I need these lists to not have a length limit, as I want the world to be infinite. If I can't find a way I will make the world "round" but infinite would be preferable. Is this possible?
An ArrayList is infinite... until memory runs out. But I guess that was not the question.
Update: Right, this is limited even though I argue nobody will notice the world restarting after two billion units.
Thought about that again. What you need is a random function that creates the same value again and again when you give it seed and current position. So you do not store the world, you recalculate it on the fly.
So you need an infinite counter only for the position in your world. The only challenge will be the storage of event results such us eaten mushrooms and destroyed bridges.
Storing all the data in a list will have a lot of limitations.
If you use an ArrayList, you can't have infinite elements.
If you use a LinkedList, you lose random access, so speed is a lot slower.
And for any list, RAM is an issue.
You'd be better off by splitting generated areas into chunks, then storing those to the harddrive.
Now, you'd still want a list of loaded areas, but this will be limited by a scope. If you're 2 game-miles to the East of some town, no point keeping the town information in reference (I hope).
One very popular game to this is Minecraft. Attempting to load the entire Minecraft world into your RAM won't happen - yet it still has the potential for infinite worlds.
If the world is going to be huge, I wouldn't store it in an ArrayList or a LinkedList. Instead you can make the whole world depend on a randomly selected long value seed. The terrain at position i can then be found using new Random(seed ^ i).nextInt() (or something). That way the world will be (effectively) infinite and you won't have to save the terrain in memory. Whenever you return to a previously visited part of the world it will be the same as it was before. The number of different worlds is 2^64 so you'd have to live a very long time before you saw the same world again.
ArrayList can contain up to 2^31 values (because length of array is integer, which is unsigned 4 byte structure).
However LinkedList is limitless, the only limit is the memory of JVM.

Optimal way of drawing temporary object in Android OpenGL ES 1.x

I'm developing a small app for Android with OpenGL ES 1.x. There is no glBegin-glEnd-functionality so one has to define vertex (and color and texcoord) arrays for the objects to be drawn, and then use matrix operations to move, scale, and rotate them. This works nicely for large objects, nothing to complain here...
However, if one want's to draw small, "temporary" objects (e.g. just a line from point A to point B), things get a bit annoying. I have thus created some small utility functions such as:
DrawHelper.drawLine(starting point, ending point)
I have noticed two possible ways to do this. My question is which of these versions is preferred? Since we are dealing with such simple helper functions, and both methods are easy and understandable, one might as well write them as good as possible from the start, even if the potential speed gain would be very low. So please no "benchmark and identify bottlenecks first".. =)
Method 1:
The draw helper has FloatBuffer containing the points (0,0,0) and (1,0,0). I draw this line every time with the appropriate modelview matrix in place transforming the two points to the desired locations.
Method 2:
The draw helper has a dummy FloatBuffer and I use FloatBuffer.put to feed in the new points every time.
Method 1 is clearly (?) better for larger objects such as circles or other geometric shapes. How about a simple line or a simple triangle?
You always choose the method that involves fewer work. Applying a matrix multiplication is takes a lot more computations than two vector assignments. Also the matrix transformation approach sends ~2.5 times as much data to the GPU (a whole 4×4-matrix) than sending two 3-vectors.
OTOH Java adds the penality of going through a FloatBuffer.

Optimal data structures for a tile-based RPG In java

The game is tile-based, but the tiles are really only for terrain and path-finding purposes. Sprite movement is free-form (ie, the player can be half way through a tile).
The maps in this game are very large. At normal zoom tiles are 32*32 pixels, and maps sizes can be up 2000x2000 or larger (4 million tiles!). Currently, a map is an array of tiles, and the tile object looks like this:
public class Tile {
public byte groundType;
public byte featureType;
public ArrayList<Sprite> entities;
public Tile () {
groundType = -1;
featureType = -1;
entities = null;
}
}
Where groundType is the texture, and featureType is a map object that takes up an entire tile (such as a tree, or large rock). These types of features are quite common so I have opted to make them their own variable rather than store them in entities, which is a list of objects on the tile (items, creatures, etc). Entities are saved to a tile for performance reasons.
The problem I am having is that if entities is not initialized to null, Java runs out of heap space. But setting it to null and only initializing when something moves into the tile seems to me a bad solution. If a creature were moving across otherwise empty tiles, lists would constantly need to be initialized and set back to null. Is this not poor memory management? What would be a better solution?
Have a single structure (start with an ArrayList) containing all of
your sprites.
If you're running a game loop and cycling through the sprites list,
say, once very 30-50 seconds and there are up to, say, 200 sprites,
you shouldn't have a performance hit from this structure per se.
Later on, for other purposes such as collision detection, you may
well need to revise the structure of just a single ArrayList. I would suggest
starting with the simple, noddyish solution to get your game logic sorted out, then optimise as necessary.
For your tiles, if space is a concern, then rather than having a special "Tile" object, consider packing the
information for each tile into a single byte, short or int if not
actually much specific information per tile is required. Remember
that every Java object you create has some overhead (for the sake of
argument, let's say in the order of 24-32 bytes per object depending
on VM and 32 vs 64 bit processor). An array of 4 million bytes is
"only" 4MB, 4 million ints "only" 16MB.
Another solution for your tile data, if packing a tile's specification into a single primitive isn't practical, is to declare a large ByteBuffer, with each tile's data stored at index (say) tileNo * 16 if each tile needs 16 bytes of data.
You could consider not actually storing all of the tiles in memory. Whether this is appropriate will depend on your game. I would say that 2000x2000 is still within the realm that you could sensibly keep the whole data in memory if each individual tile does not need much data.
If you're thinking the last couple of points defeat the whole point of an object-oriented language, then yes you're right. So you need to weigh up at what point you opt for the "extreme" solution to save heap space, or whether you can "get away with" using more memory for the sake of a better programming paradigm. Having an object per tile might use (say) in the order of a few hundred megabytes. In some environments that will be ridiculous. In others where several gigabytes are available, it might be entirely reasonable.

Techniques for keeping data in the cache, locality?

For ultra-fast code it essential that we keep locality of reference- keep as much of the data which is closely used together, in CPU cache:
http://en.wikipedia.org/wiki/Locality_of_reference
What techniques are to achieve this? Could people give examples?
I interested in Java and C/C++ examples. Interesting to know of ways people use to stop lots of cache swapping.
Greetings
This is probably too generic to have clear answer. The approaches in C or C++ compared to Java will differ quite a bit (the way the language lays out objects differ).
The basic would be, keep data that will be access in close loops together. If your loop operates on type T, and it has members m1...mN, but only m1...m4 are used in the critical path, consider breaking T into T1 that contains m1...m4 and T2 that contains m4...mN. You might want to add to T1 a pointer that refers to T2. Try to avoid objects that are unaligned with respect to cache boundaries (very platform dependent).
Use contiguous containers (plain old array in C, vector in C++) and try to manage the iterations to go up or down, but not randomly jumping all over the container. Linked Lists are killers for locality, two consecutive nodes in a list might be at completely different random locations.
Object containers (and generics) in Java are also a killer, while in a Vector the references are contiguous, the actual objects are not (there is an extra level of indirection). In Java there are a lot of extra variables (if you new two objects one right after the other, the objects will probably end up being in almost contiguous memory locations, even though there will be some extra information (usually two or three pointers) of Object management data in between. GC will move objects around, but hopefully won't make things much worse than it was before it run.
If you are focusing in Java, create compact data structures, if you have an object that has a position, and that is to be accessed in a tight loop, consider holding an x and y primitive types inside your object rather than creating a Point and holding a reference to it. Reference types need to be newed, and that means a different allocation, an extra indirection and less locality.
Two common techniques include:
Minimalism (of data size and/or code size/paths)
Use cache oblivious techniques
Example for minimalism: In ray tracing (a 3d graphics rendering paradigm), it is a common approach to use 8 byte Kd-trees to store static scene data. The traversal algorithm fits in just a few lines of code. Then, the Kd-tree is often compiled in a manner that minimalizes the number of traversal steps by having large, empty nodes at the top of tree ("Surface Area Heuristics" by Havran).
Mispredictions typically have a probability of 50%, but are of minor costs, because really many nodes fit in a cache-line (consider that you get 128 nodes per KiB!), and one of the two child nodes is always a direct neighbour in memory.
Example for cache oblivious techniques: Morton array indexing, also known as Z-order-curve-indexing. This kind of indexing might be preferred if you usually access nearby array elements in unpredictable direction. This might be valuable for large image or voxel data where you might have 32 or even 64 bytes big pixels, and then millions of them (typical compact camera measure is Megapixels, right?) or even thousands of billions for scientific simulations.
However, both techniques have one thing in common: Keep most frequently accessed stuff nearby, the less frequently things can be further away, spanning the whole range of L1 cache over main memory to harddisk, then other computers in the same room, next room, same country, worldwide, other planets.
Some random tricks that come to my mind, and which some of them I used recently:
Rethink your algorithm. For example, you have an image with a shape and the processing algorithm that looks for corners of the shape. Instead of operating on the image data directly, you can preprocess it, save all the shape's pixel coordinates in a list and then operate on the list. You avoid random the jumping around the image
Shrink data types. Regular int will take 4 bytes, and if you manage to use e.g. uint16_t you will cache 2x more stuff
Sometimes you can use bitmaps, I used it for processing a binary image. I stored pixel per bit, so I could fit 8*32 pixels in a single cache line. It really boosted the performance
Form Java, you can use JNI (it's not difficult) and implement your critical code in C to control the memory
In the Java world the JIT is going to be working hard to achieve this, and trying to second guess this is likely to be counterproductive. This SO question addresses Java-specific issues more fully.

Categories

Resources