If I have to store 3 integer values and would like to just retrieve the same , no calculation is required.Which one of the following would be a better option?
int i,j,k;
or
int [] arr = new int[3];
Array would be allocating 3 continuous blocks of memory (after allocation of space by JVM) or randomly assigning variables to some memory location (which I guess would consume lesser time for JVM as compared to array).
Apologies if the question is too trivial.
The answer is: It depends.
You shouldn't think too much about the performance implications for this case. the performance difference between the two is not big enough to notice.
What you really need to be on the look out for is readability and maintainability.
if i, j, and k, all essentially mean the same thing, and you're going to be using them the same way, and you feel like you might want to iterate over them, then it might make sense to use an array, so that you can iterate over them more easily.
if they're different values, with different meanings, and you're going to be using them differently, than it does not makes sense to include them in an array. They should each have their own identity, and their own descriptive variable name.
Choose whichever makes most sense semantically:
If these variables are three for a fundamental reason (maybe they are coordinates in the 3D space of a 3D game engine), then use three separate variables (because making, say, a 4D game engine is not a trivial change).
If these variables are three now but they could be trivially changed to be four tomorrow, it's reasonable to consider an array (or, better yet, a new type that contains them).
In terms of performance, traditionally local variables are faster than arrays. Under specific circumstances, the array may be allocated on the stack. Under specific circumstances, bound checks can be removed.
But don't make decisions based on performance, unless you have first done everything else correctly first and you have thorough tests and this particular piece of code is a performance-critical hot-spot and you're sure that it is the bottleneck of your application at the moment.
It depends on how would you access them. Array is of course an overhead, because you will first calculate a reference to a value and then get it. So if these values are totally unrelated, array is bad, and it may even count as code obfuscation. But naming variables like i, j, k is sort of obfuscation, too. Obfuscation is better to do automatically at build stage, there are tools like Proguard™ which can do it.
The two are not the same at all and are for different purpose.
in the first example you gave int i,j,k; you are pushing the values on to the stack,
The stack is for short term use and small data sizes i.e. function call arguments and iterator states.
The second example you gave int [] arr = new int[3]; the new keyword is allocating actual memory for the heap hat was giving to the process by the operating system.
The stack is optimized for short term use and all (most) all CPUs have a registers that are dedicated to point at the stack location and base making the stack a grate place for small dirty variables. The stack is also limited in size (by theory), its only a few KB in size (average case).
The heap on he other hand is proper memory allocation for large data types and proper memory management.
So, the two may be used for the same thing but it dose not mean it's right.
Arrays/Objects/Dicts go in allocated memory from he heap, function arguments (and iterator indexes usually) go on the stack.
It depends, but most probably, using distinct variables is the way to go.
In general, don't do micro-optimizations. Nobody will ever notice any difference in performance. Readable and maintainable code is what really matters in high-level languages.
See this article on micro-optimizations.
Related
Let's assume I want to store (integer) x/y-values, what is considered more efficient: Storing it in a primitive value like long (which fits perfect, due sizeof(long) = 2*sizeof(int)) using bit-operations like shift, or and a mask, or creating a Point-Class?
Keep in mind that I want to create and store many(!) of these points (in a loop). Would be there a perfomance issue when using classes? The only reason I would prefer storing in primtives over storing in class is the garbage-collector. I guess generating new objects in a loop would trigger the gc way too much, is it correct?
Of course packing those as long[] is going to take less memory (though it is going to be contiguous). For each Object (a Point) you will pay at least 12 bytes more for the two headers.
On other hand, if you are creating them in a loop and thus escape analysis can prove they don't escape, it can apply an optimization called "scalar replacement" (thought is it very fragile); where your Objects will not be allocated at all. Instead those objects will be "desugared" to fields.
The general rule is that you should code the way it is the most easy to maintain and read that code. If and only if you see performance issues (via a profiler let's say or too many pauses), only then you should look at GC logs and potentially optimize code.
As an addendum, jdk code itself is full of such long where each bit means different things - so they do pack them. But then, me and I doubt you, are jdk developers. There such things matter, for us - I have serious doubts.
I'm programming something in Java, for context see this question: Markov Model descision process in Java
I have two options:
byte[MAX][4] mypatterns;
or
ArrayList mypatterns
I can use a Java ArrayList and append a new arrays whenever I create them, or use a static array by calculating all possible data combinations, then looping through to see which indexes are 'on or off'.
Essentially, I'm wondering if I should allocate a large block that may contain uninitialized values, or use the dynamic array.
I'm running in fps, so looping through 200 elements every frame could be very slow, especially because I will have multiple instances of this loop.
Based on theory and what I have heard, dynamic arrays are very inefficient
My question is: Would looping through an array of say, 200 elements be faster than appending an object to a dynamic array?
Edit>>>
More information:
I will know the maxlength of the array, if it is static.
The items in the array will frequently change, but their sizes are constant, therefore I can easily change them.
Allocating it statically will be the likeness of a memory pool
Other instances may have more or less of the data initialized than others
You right really, I should use a profiler first, but I'm also just curious about the question 'in theory'.
The "theory" is too complicated. There are too many alternatives (different ways to implement this) to analyse. On top of that, the actual performance for each alternative will depend on the the hardware, JIT compiler, the dimensions of the data structure, and the access and update patterns in your (real) application on (real) inputs.
And the chances are that it really doesn't matter.
In short, nobody can give you an answer that is well founded in theory. The best we can give is recommendations that are based on intuition about performance, and / or based on software engineering common sense:
simpler code is easier to write and to maintain,
a compiler is a more consistent1 optimizer than a human being,
time spent on optimizing code that doesn't need to be optimized is wasted time.
1 - Certainly over a large code-base. Given enough time and patience, human can do a better job for some problems, but that is not sustainable over a large code-base and it doesn't take account of the facts that 1) compilers are always being improved, 2) optimal code can depend on things that a human cannot take into account, and 3) a compiler doesn't get tired and make mistakes.
The fastest way to iterate over bytes is as a single arrays. A faster way to process these are as int or long types as process 4-8 bytes at a time is faster than process one byte at a time, however it rather depends on what you are doing. Note: a byte[4] is actually 24 bytes on a 64-bit JVM which means you are not making efficient use of your CPU cache. If you don't know the exact size you need you might be better off creating a buffer larger than you need even if you are not using all the buffer. i.e. in the case of the byte[][] you are using 6x time the memory you really need already.
Any performance difference will not be visible, when you set initialCapacity on ArrayList. You say that your collection's size can never change, but what if this logic changes?
Using ArrayList you get access to a lot of methods such as contains.
As other people have said already, use ArrayList unless performance benchmarks say it is a bottle neck.
If one took say 1000 lines of computer code and instead of the variables being declared independently, they were grouped together (obviously this does depend on the variable sizes being used) into classes and structs, would this directly increase the cache spacial locality (and therefore reduce the cache miss rate)?
I was under the impression by associating the variables within a class/struct they would be assigned continuous memory addresses?
If you are talking about method-local variables, they are already contiguous on the stack, or strictly speaking in activation records which are all but invariably on the stack. If you are talking about references to Java objects, or pointers to dynamically allocated C++ objects, putting them into containing classes won't make any difference for another reason: the objects concerned will still be in arbitrary positions in the heap.
Answering this question is not possible without making some quite unreasonable assumptions. Spatial locality is as much about algorithms as it is about data structures, so grouping logically related data elements together may be of no consequence or even worse based on an algorithm that you use.
For example, consider a representation of 100 points in 3D space. You could put them in three separate arrays, or create a 3-tuple struct/class, and make an array of these.
If your algorithm must get all three coordinates of each point at once on each step, the tuple representation wins. However, think what would happen if you wanted to build an algorithm that operates on each dimension independently, and paralelize it three-way among three independent threads. In this case three separate arrays would win hands down, because that layout would avoid false sharing, and improve spatial locality as far as the one-dimension-at-a-time algorithm is concerned.
This example shows that there is no "one size fits all" solution. Spatial locality should always be considered in the context of a specific algorithm; a good solution in one case could turn bad in other seemingly similar cases.
If you are asking whether to group local variables into explicitly defined structures, there is not going to be an advantage. Local variables are implemented in terms of activation records, which are usually closely related to the implementation of class structures, for any language that has both.
So, local variables should already have good spatial locality, unless the language implementation is doing something weird to screw it up.
You might improve locality by isolating large chunks of local state which isn't used during recursion into separate non-recursing functions. This would be a micro-optimization, so you need to inspect the machine code first to be sure it's not a waste of time. Anyway, it's unrelated to moving locals into a class.
For ultra-fast code it essential that we keep locality of reference- keep as much of the data which is closely used together, in CPU cache:
http://en.wikipedia.org/wiki/Locality_of_reference
What techniques are to achieve this? Could people give examples?
I interested in Java and C/C++ examples. Interesting to know of ways people use to stop lots of cache swapping.
Greetings
This is probably too generic to have clear answer. The approaches in C or C++ compared to Java will differ quite a bit (the way the language lays out objects differ).
The basic would be, keep data that will be access in close loops together. If your loop operates on type T, and it has members m1...mN, but only m1...m4 are used in the critical path, consider breaking T into T1 that contains m1...m4 and T2 that contains m4...mN. You might want to add to T1 a pointer that refers to T2. Try to avoid objects that are unaligned with respect to cache boundaries (very platform dependent).
Use contiguous containers (plain old array in C, vector in C++) and try to manage the iterations to go up or down, but not randomly jumping all over the container. Linked Lists are killers for locality, two consecutive nodes in a list might be at completely different random locations.
Object containers (and generics) in Java are also a killer, while in a Vector the references are contiguous, the actual objects are not (there is an extra level of indirection). In Java there are a lot of extra variables (if you new two objects one right after the other, the objects will probably end up being in almost contiguous memory locations, even though there will be some extra information (usually two or three pointers) of Object management data in between. GC will move objects around, but hopefully won't make things much worse than it was before it run.
If you are focusing in Java, create compact data structures, if you have an object that has a position, and that is to be accessed in a tight loop, consider holding an x and y primitive types inside your object rather than creating a Point and holding a reference to it. Reference types need to be newed, and that means a different allocation, an extra indirection and less locality.
Two common techniques include:
Minimalism (of data size and/or code size/paths)
Use cache oblivious techniques
Example for minimalism: In ray tracing (a 3d graphics rendering paradigm), it is a common approach to use 8 byte Kd-trees to store static scene data. The traversal algorithm fits in just a few lines of code. Then, the Kd-tree is often compiled in a manner that minimalizes the number of traversal steps by having large, empty nodes at the top of tree ("Surface Area Heuristics" by Havran).
Mispredictions typically have a probability of 50%, but are of minor costs, because really many nodes fit in a cache-line (consider that you get 128 nodes per KiB!), and one of the two child nodes is always a direct neighbour in memory.
Example for cache oblivious techniques: Morton array indexing, also known as Z-order-curve-indexing. This kind of indexing might be preferred if you usually access nearby array elements in unpredictable direction. This might be valuable for large image or voxel data where you might have 32 or even 64 bytes big pixels, and then millions of them (typical compact camera measure is Megapixels, right?) or even thousands of billions for scientific simulations.
However, both techniques have one thing in common: Keep most frequently accessed stuff nearby, the less frequently things can be further away, spanning the whole range of L1 cache over main memory to harddisk, then other computers in the same room, next room, same country, worldwide, other planets.
Some random tricks that come to my mind, and which some of them I used recently:
Rethink your algorithm. For example, you have an image with a shape and the processing algorithm that looks for corners of the shape. Instead of operating on the image data directly, you can preprocess it, save all the shape's pixel coordinates in a list and then operate on the list. You avoid random the jumping around the image
Shrink data types. Regular int will take 4 bytes, and if you manage to use e.g. uint16_t you will cache 2x more stuff
Sometimes you can use bitmaps, I used it for processing a binary image. I stored pixel per bit, so I could fit 8*32 pixels in a single cache line. It really boosted the performance
Form Java, you can use JNI (it's not difficult) and implement your critical code in C to control the memory
In the Java world the JIT is going to be working hard to achieve this, and trying to second guess this is likely to be counterproductive. This SO question addresses Java-specific issues more fully.
I am creating a simulation program, and I want the code to be very optimized. Right now I have an array that gets cycled through a lot and in the various for loops I use
for(int i = 0; i<array.length; i++){
//do stuff with the array
}
I was wondering if it would be faster if I saved a variable in the class to specify this array length, and used that instead. Or if it matters at all.
Accessing the length attribute on an array is as fast as it gets.
You'll see people recommending that you save a data structure size before entering the loop because it means a method all for each and every iteration.
But this is the kind of micro-optimization that seldom matters. Don't worry much about this kind of thing until you have data that tells you it's the reason for a performance issue.
You should be spending more time thinking about the algorithms you're embedding in that loop, possible parallelism, etc. That'll be far more meaningful in your quest for an optimized solution.
it really doesn't matter, when you make an array with [#] it stores that number, and just returns that.
If you're using []{object1, object2, object3} then it will calculate the size, in either case, the value is calculated once only, and stored.
length is a variable, not a method, when you access it it has a value, and it just retrieves that value, it's not like it's going to calculate the length of the array each time you use the length attribute. array.length is no slower, in fact, having to set an extra variable will actually take more time then just using length.
array.length is faster, although by fractions of a milisecond, but in loops and such you can save a ton of time if cycleing through let's say 1,000,000 items (might be 0.1 seconds).
It's unlikely to make a difference.
The broader issue is you want your simulation to run as fast as possible.
This sourceforge project shows how you can use the running program to tell you what to optimize, as opposed to wondering about little things like.length.
Most code, as first written (other than little toy programs), has huge speedup potential, in ways that elude being guessed.
The biggest opportunities for speedup are usually diffuse, like your choice of data structure.
They aren't localized to particular routines, and measuring tools don't point them out.
The method demonstrated in that project, random pausing, does identify them.
The changes you need to make may not be easy to do, but they will yield speedup.