Is there a way to use limitless lists in java? - java

I'm trying to make a randomly-generated 2-d game, which I plan to do with a list of terrain to the right of the spawn point and a list of terrain to the left of the spawn point. However, I need these lists to not have a length limit, as I want the world to be infinite. If I can't find a way I will make the world "round" but infinite would be preferable. Is this possible?

An ArrayList is infinite... until memory runs out. But I guess that was not the question.
Update: Right, this is limited even though I argue nobody will notice the world restarting after two billion units.
Thought about that again. What you need is a random function that creates the same value again and again when you give it seed and current position. So you do not store the world, you recalculate it on the fly.
So you need an infinite counter only for the position in your world. The only challenge will be the storage of event results such us eaten mushrooms and destroyed bridges.

Storing all the data in a list will have a lot of limitations.
If you use an ArrayList, you can't have infinite elements.
If you use a LinkedList, you lose random access, so speed is a lot slower.
And for any list, RAM is an issue.
You'd be better off by splitting generated areas into chunks, then storing those to the harddrive.
Now, you'd still want a list of loaded areas, but this will be limited by a scope. If you're 2 game-miles to the East of some town, no point keeping the town information in reference (I hope).
One very popular game to this is Minecraft. Attempting to load the entire Minecraft world into your RAM won't happen - yet it still has the potential for infinite worlds.

If the world is going to be huge, I wouldn't store it in an ArrayList or a LinkedList. Instead you can make the whole world depend on a randomly selected long value seed. The terrain at position i can then be found using new Random(seed ^ i).nextInt() (or something). That way the world will be (effectively) infinite and you won't have to save the terrain in memory. Whenever you return to a previously visited part of the world it will be the same as it was before. The number of different worlds is 2^64 so you'd have to live a very long time before you saw the same world again.

ArrayList can contain up to 2^31 values (because length of array is integer, which is unsigned 4 byte structure).
However LinkedList is limitless, the only limit is the memory of JVM.

Related

optimizing a grid-based particle system

I've implemented a game somewhat similar to this one in Java and currently find that I'm hitting a ceiling number of particles of ~80k. My game board is a 2D array of references to 'Particle' objects, each of which must be updated every frame. Different kinds of 'Particle' have different behaviors and may move or change their state in response to environmental conditions such as wind or adjacent particles.
Some possible 'rules' that might be in effect:
If a Particle of type lava is adjacent to a Particle of type water, both disappear, and the lava is replaced by obsidian
If a gas Particle is adjacent to a Lava, Fire, Ember, etc. Particle, it will ignite, and produce fire and smoke
If a sufficient number of dust particles are stacked on top of one another, those at lower levels, as if under pressure, can become sedimentary rock
I've searched around and haven't been able to find any algorithms or data structures that seem particularly well-suited to speeding up the task. It seems that some kind of memoization might be useful? Would a quad tree be of any use here? I've seen them used in the somewhat similar Conway's Game of Life with the Hashlife algorithm. Or, is it the case that I'm not going to be able to do too much to increase the speed?
Hashlife will work in principle but there are two reasons why you might not get as much out of it as Conway Life.
Firstly it relies on recurring patterns. The more cell states you have and the less structured the plane the fewer cache hits you'll encounter and the more you'll be working with brute force.
Secondly as another poster noted rules that involve non-local effects will either mean your primitives (in Conway Life 4x4) will need to be bigger so you will have abandon divide and conquer at say 8x8 or 16x16 or whatever size guarantees you can correctly calculate the middle portion in n/2 time.
That's made the worse by the diversity of states. In Conway Life it's common to pre-calculate all 4x4 gridsor at least have nearly all relevant ones in cache.
With 2 states there are only 65536 4x4 grids (peanuts on modern platforms) but with only 3 there are 43046721.
If you have to have 8x8 primitives it gets very big very quickly and beyond any realistic storage.
So the larger the primitive and the more states you have that becomes quickly unrealistic.
One way to address that primitive size is to have the rock rule propagate pressure. So a Rock+n (n representing pressure) becomes Rock+(n+1) in the next generation if it has Rock+m where m>=n above it. Up to some threshold k where it turns to sedimentary Rock.
That means cells are still only dependent on their immediate neighbours but again multiplies up the number of states.
If you have cell types like the 'Bird' in the example given and you have velocities that you don't keep to a minimum (say -1,0,1 in either direction) you'll totally collapse memoization. Even then the chaotic nature of such rules may make cache hits on those areas vanishingly small.
If your rules don't lead to steady states (or repeating cycles) like Conway Life often does the return on memoization will be limited unless your plane is mostly empty.
i don't understand your problem clearly but I think cuda or OpenGL (GPU programming in general) can easily handle your ref link: https://dan-ball.jp/en/javagame/dust/
I'd use a fixed NxN grid for this mainly because there are too many points moving around each frame to benefit from the recursive subdividing nature of the quad-tree. This is a case where a straightforward data structure with the data representations and memory layouts tuned appropriately can make all the difference in the world.
The main thing I'd do for Java here is actually avoid modeling each particle as an object. It should be raw data like just some plain old data like floats or ints. You want to be able to work with contiguity guarantees for spatial locality with sequential processing and not pay for the cost of padding and the 8-byte overhead per class instance. Split cold fields away from hot fields.
For example, you don't necessarily need to know a particle's color to move it around and apply physics. As a result, you don't want an AoS representation here which has to load in a particle's color into cache lines during the sequential physics pass only to evict it and not use it. Cram as much relevant memory used together into a cache line as you can by separating it away from the irrelevant memory for a particular pass.
Each cell in the grid should just store an index into a particle, with each particle storing an index to the next particle in the cell (a singly-linked list, but an intrusive one which requires allocating no nodes and just uses indices into arrays). A -1 can be used to indicate the end of the list as well as empty cells.
To find collisions between particles of interest, look in the same cell as the particle you're testing, and you can do this in parallel where each thread handles one or more cells worth of particles.
The NxN grid should be very fine given the boatload of moving particles you can have per frame. Play with how many cells you create to find something optimal for your input sizes. You might even have multiple grids. If certain particles don't interact with each other, don't put them in the same grid. Don't worry about the memory usage of the grid here. If each grid cell just stores a 32-bit index to the first particle in the cell, then a 200x200 grid only takes 160 kilobytes with a 32-bit next index overhead per particle.
I made something similar to this some years back in C using the technique above (but not with as many interesting particle interactions as the demo game) which could handle about 10 mil particles before it started to go below 30 FPS and on older hardware with only 2 cores. It did use C as well as SIMD and multithreading, but I think you can get a very speedy solution in Java handling a boatload of particles at once if you do the above.
Data structure:
As particles move from one cell to the next, all you do is manipulate a couple of integers to move them from one cell to the other. Cells don't "own memory" or allocate any. They're just 32-bit indices.
To figure out which cell a particle occupies, just do:
cell_x = (int)(particle_x[particle_index] / cell_size)
cell_y = (int)(particle_y[particle_index] / cell_size)
cell_index = cell_y * num_cols + cell_x
... much cheaper constant-time operation than traversing a tree structure and having to rebalance it as particles move around.

Storing enormously large arrays

I have a problem. I work in Java, Eclipse. My program calculates some mathematical physics, and I need to draw animation (Java SWT package) of the process (some hydrodynamics). The problem is 2D, so each iteration returns two dimensional array of numbers. One iteration takes rather long time and time needed for iteration changes from one iteration to another, so showing pictures dynamically as program works seems like a bad idea. In this case my idea was to store a three dimensional array, where third index represents time, and building an animation when calculations are over. But in this case, as I want accuracuy from my program, I need a lot of iterations, so program easily reaches maximal array size. So the question is: how do I avoid creating such an enormous array or how to avoid limitations on array size? I thought about creating a special file to store data and then reading from it, but I'm not sure about this. Do you have any ideas?
When I was working on a procedural architecture generation system at university for my dissertation I created small, extremely easily read and parsed binary files for calculated data. This meant that the data could be read in within an acceptable amount of time, despite being quite a large amount of data...
I would suggest doing the same for your animations... It might be of value storing maybe five seconds of animation per file and then caching each of these as they are about to be required...
Also, how large are your arrays, you could increase the amount of memory your JVM is able to allocate if it's not maximum array size, but maximum memory limitations you're hitting.
I hope this helps and isn't just my ramblings...

Efficiency of ArrayList

I am making a program in Java in which a ball bounces around on the screen. The user can add other balls, and they all bounce off of each other. My question lies in the storage of the added balls. At the moment, I am using an ArrayList to store them, and every time the space bar is pressed, a new ball class is created and added to an Array List. Is this the most efficient way of doing things? I don't specify the size of the Array List at the beginning, so is it inefficient to have to allocate a new space on the array every time the user wants a new ball, even if the ball count will get up in the hundreds? Is there another class I could use to handle this in a more efficient manner?
Thanks!
EDIT:
Sorry, I should have been more clear. I iterate through the balls every 30 milliseconds, using nested for loops to see if they are intersecting with each other. I do access one ball the most often (the ball which the user can control with the arrow keys, another feature of the game), but the user can choose to switch control balls. Balls are never removed. So, I am performing some fairly complex calculations (I use my own vector class to move them off of each other every time there is a collision) on the balls very often.
Measure it and find out! In all seriousness, often times the best way to get answers to these questions is to set up a benchmark and swap in different collection types.
I can tell you that it won't allocate new space every time you add a new item to the ArrayList. Extra space is allocated so that it has room to grow.
LinkedList is another List option. It is super cheap to add items, but random access (list.get(10)) is expensive. Sets could also be good if you don't need ordered access (though there are ordered sets, too), and you want a Map implementation if you're accessing them by some sort of key/id. It really all depends on how you're using the collection.
Update based on added details
It sounds like you are mostly doing sequential reads through the entire list. In that scenario, a LinkedList is probably your best choice. Though again, if you only expose the List interface to the rest of your code (or even a more general Collection), you can easily swap in different implementations and actually measure the difference.
ArrayList is a highly optimized and very efficient wrapper on top of a plain Java array. A small timing overhead comes from copying array elements, which happens when the allocated size is less than required number of elements. When you grow the array into a few hundreds of items, the copying will happen less than ten times, so the price you pay for not knowing the size in advance is very small. You can further reduce that overhead by suggesting an initial size for your ArrayList.
Removing from the middle of the ArrayList does take linear time. If you plan to remove items and/or insert them in the middle of the list frequently, this may become an issue. Note, however, that the overhead is not going to be worse than that for a plain array.
I iterate through the balls every 30 milliseconds, using nested for loops to see if they are intersecting with each other.
This does not have much to do with the collection in which the balls are stored. You could use a spatial index to improve the speed of finding intersections.
About ArrayList in Java, the complexity of remove at the end and add one element is Amortize O(1). Or, you can say, it's almost efficient in most cases. (In some rare cases, it will be awful.)
But you should think more carefully about your design before choosing your data structure.
How many objects often in your collection. If it's small, you can free to choose any data structure that you feel easily to work with. it will almost doesn't lost performance for your code.
If you often find one ball in all of your balls, another datastructure such as HashMap or HashSet would be better.
Or you often delete at middle of your list, maybe LinkedList will be appropriate choice :)
I'd recommend working out the way in which you need to access the balls, and pick an appropriate interface (Not implementation) eg. If you're accessing sequentially only, use a List. If you need to look up the ball by ID, think of a Map. The interface should match your requirements in terms of functionality, not in terms of speed/efficiency.
Then pick an implementation, eg. HashMap or TreeMap, and write your code.
Afterwards, profile it - Is your code inefficient in the ball access code? If so, then try to optimise by switching to an alternate implementation thats more appropriate to your needs.

Techniques for keeping data in the cache, locality?

For ultra-fast code it essential that we keep locality of reference- keep as much of the data which is closely used together, in CPU cache:
http://en.wikipedia.org/wiki/Locality_of_reference
What techniques are to achieve this? Could people give examples?
I interested in Java and C/C++ examples. Interesting to know of ways people use to stop lots of cache swapping.
Greetings
This is probably too generic to have clear answer. The approaches in C or C++ compared to Java will differ quite a bit (the way the language lays out objects differ).
The basic would be, keep data that will be access in close loops together. If your loop operates on type T, and it has members m1...mN, but only m1...m4 are used in the critical path, consider breaking T into T1 that contains m1...m4 and T2 that contains m4...mN. You might want to add to T1 a pointer that refers to T2. Try to avoid objects that are unaligned with respect to cache boundaries (very platform dependent).
Use contiguous containers (plain old array in C, vector in C++) and try to manage the iterations to go up or down, but not randomly jumping all over the container. Linked Lists are killers for locality, two consecutive nodes in a list might be at completely different random locations.
Object containers (and generics) in Java are also a killer, while in a Vector the references are contiguous, the actual objects are not (there is an extra level of indirection). In Java there are a lot of extra variables (if you new two objects one right after the other, the objects will probably end up being in almost contiguous memory locations, even though there will be some extra information (usually two or three pointers) of Object management data in between. GC will move objects around, but hopefully won't make things much worse than it was before it run.
If you are focusing in Java, create compact data structures, if you have an object that has a position, and that is to be accessed in a tight loop, consider holding an x and y primitive types inside your object rather than creating a Point and holding a reference to it. Reference types need to be newed, and that means a different allocation, an extra indirection and less locality.
Two common techniques include:
Minimalism (of data size and/or code size/paths)
Use cache oblivious techniques
Example for minimalism: In ray tracing (a 3d graphics rendering paradigm), it is a common approach to use 8 byte Kd-trees to store static scene data. The traversal algorithm fits in just a few lines of code. Then, the Kd-tree is often compiled in a manner that minimalizes the number of traversal steps by having large, empty nodes at the top of tree ("Surface Area Heuristics" by Havran).
Mispredictions typically have a probability of 50%, but are of minor costs, because really many nodes fit in a cache-line (consider that you get 128 nodes per KiB!), and one of the two child nodes is always a direct neighbour in memory.
Example for cache oblivious techniques: Morton array indexing, also known as Z-order-curve-indexing. This kind of indexing might be preferred if you usually access nearby array elements in unpredictable direction. This might be valuable for large image or voxel data where you might have 32 or even 64 bytes big pixels, and then millions of them (typical compact camera measure is Megapixels, right?) or even thousands of billions for scientific simulations.
However, both techniques have one thing in common: Keep most frequently accessed stuff nearby, the less frequently things can be further away, spanning the whole range of L1 cache over main memory to harddisk, then other computers in the same room, next room, same country, worldwide, other planets.
Some random tricks that come to my mind, and which some of them I used recently:
Rethink your algorithm. For example, you have an image with a shape and the processing algorithm that looks for corners of the shape. Instead of operating on the image data directly, you can preprocess it, save all the shape's pixel coordinates in a list and then operate on the list. You avoid random the jumping around the image
Shrink data types. Regular int will take 4 bytes, and if you manage to use e.g. uint16_t you will cache 2x more stuff
Sometimes you can use bitmaps, I used it for processing a binary image. I stored pixel per bit, so I could fit 8*32 pixels in a single cache line. It really boosted the performance
Form Java, you can use JNI (it's not difficult) and implement your critical code in C to control the memory
In the Java world the JIT is going to be working hard to achieve this, and trying to second guess this is likely to be counterproductive. This SO question addresses Java-specific issues more fully.

Is there a difference in performance for calling .length on an array versus saving a size variable?

I am creating a simulation program, and I want the code to be very optimized. Right now I have an array that gets cycled through a lot and in the various for loops I use
for(int i = 0; i<array.length; i++){
//do stuff with the array
}
I was wondering if it would be faster if I saved a variable in the class to specify this array length, and used that instead. Or if it matters at all.
Accessing the length attribute on an array is as fast as it gets.
You'll see people recommending that you save a data structure size before entering the loop because it means a method all for each and every iteration.
But this is the kind of micro-optimization that seldom matters. Don't worry much about this kind of thing until you have data that tells you it's the reason for a performance issue.
You should be spending more time thinking about the algorithms you're embedding in that loop, possible parallelism, etc. That'll be far more meaningful in your quest for an optimized solution.
it really doesn't matter, when you make an array with [#] it stores that number, and just returns that.
If you're using []{object1, object2, object3} then it will calculate the size, in either case, the value is calculated once only, and stored.
length is a variable, not a method, when you access it it has a value, and it just retrieves that value, it's not like it's going to calculate the length of the array each time you use the length attribute. array.length is no slower, in fact, having to set an extra variable will actually take more time then just using length.
array.length is faster, although by fractions of a milisecond, but in loops and such you can save a ton of time if cycleing through let's say 1,000,000 items (might be 0.1 seconds).
It's unlikely to make a difference.
The broader issue is you want your simulation to run as fast as possible.
This sourceforge project shows how you can use the running program to tell you what to optimize, as opposed to wondering about little things like.length.
Most code, as first written (other than little toy programs), has huge speedup potential, in ways that elude being guessed.
The biggest opportunities for speedup are usually diffuse, like your choice of data structure.
They aren't localized to particular routines, and measuring tools don't point them out.
The method demonstrated in that project, random pausing, does identify them.
The changes you need to make may not be easy to do, but they will yield speedup.

Categories

Resources