The first version is:
int[] a = new int[1000];
int[] b = new int[1000];
The second version is:
class Helper{
int a;
int b;
}
Helper[] c = new Helper[1000];
My intuition tells me that the second one is better, but I could persuade myself in reason....
Can anyone compare the time complexity and space complexity of these two structure for me. For example, are these two version cost the same space? Or the second one cost less?
Thank you!
The real question you should be asking is what is the relation between aa[i] and bb[i]. If aa[i] and bb[i] are properties of the same object (that has a more meaningful description than "Helper"), you should definitely put them in some class instead of using multiple primitive arrays. After all, Java is an object oriented language.
You shouldn't care about performance differences. Those will be insignificant. The important thing is that you code makes sense to whoever reads it.
While Eran is right, I'll add a few more points.
Sure, chances are you're very likely not going to bother about the performance - the difference is very insignificant for a general purpose application. Readability is what matters.
Still, in terms of technical details:
Space complexity is the same (it's always as many elements you have), but the absolute value in bytes is different.
Array of pairs will cost you more - each Java object has an overhead of several bytes. In case of two arrays, it's only a few bytes overhead per each array.
Also, in case of two arrays, the values of each array will reside next to each other in memory - reading all values of one of those array will be more effective in terms of CPU cache and memory layout. In case of array of objects, you now have an array of links to those objects and need first read the address of the object, before you can access the actual value of the field.
These are just general points so you can get a feeling of what's going on. In practice it all depends on how you want to work with those structures.
Related
I wrote a program in Java that recursively performs path traversal and carries the path thus far, each step in the path is a String and therefor I was thinking initially that to cut down on memory use of the algorithm it could be better to map the Strings that represent each place on the map by with an int and carry it instead of the String.
However when I thought a bit more I wasn't sure anymore since what's being carried is the reference not the actual String(I always pass the reference), so would mapping the String values to ints:
1) would only increase the memory used overall by introducing an additional int array?
2) would decrease the memory used when the recursive method is handling the current int array instead of a String array?
I definitely would like to read up more on recursive algorithms and there issues implementations in Java, if anyone has any good links.
Given the info we have so far I'd say you will only use more memory if you introduce your Map idea. You only pass references to the recursive methods and it costs the same regardless of type.
But if you provide some sample code we might be able to give you a better answer. Sample code is generally a key success factor if you want good answers here on SO.
Iterating over consecutive elements of an array is generally considered to be more efficient than iterating over consecutive linked list elements because of caching.
This is undoubtedly true as long as elements have elementary data types. But if the elements are objects, my understanding is that only the references to the objects will be stored in the contiguous memory area of the array (which is likely to be cached) while the actual object data will be stored anywhere in main memory and cannot be cached effectively.
As you normally not only iterate over the container but also need to access object data in each iteration, doesn't this more or less kill the performance benefit of the array over the list?
Edit: The comment about different scenarios varying greatly is probably correct. So let's consider a specific one: You search for a specific object in the container. In order to find it you need to compare a given string to another string that is a class variable of the object.
No, for objects ("pointers") there is an indirection in both. A linked list needs for every node to step to the next one. So it still has an overhead.
But yes, in a relative way the gain concerns only a part, very roughly the half of the pure walk through, counting indirection steps.
And ofcourse every indirection makes access more chaotic, slower.
BTW there is the ArrayList too being similar fast as arrays.
Out of interest: Recently, I encountered a situation in one of my Java projects where I could store some data either in a two-dimensional array or make a dedicated class for it whose instances I would put into a one-dimensional array. So I wonder whether there exist some canonical design advice on this topic in terms of performance (runtime, memory consumption)?
Without regard of design patterns (extremely simplified situation), let's say I could store data like
class MyContainer {
public double a;
public double b;
...
}
and then
MyContainer[] myArray = new MyContainer[10000];
for(int i = myArray.length; (--i) >= 0;) {
myArray[i] = new MyContainer();
}
...
versus
double[][] myData = new double[10000][2];
...
I somehow think that the array-based approach should be more compact (memory) and faster (access). Then again, maybe it is not, arrays are objects too and array access needs to check indexes while object member access does not.(?) The allocation of the object array would probably(?) take longer, as I need to iteratively create the instances and my code would be bigger due to the additional class.
Thus, I wonder whether the designs of the common JVMs provide advantages for one approach over the other, in terms of access speed and memory consumption?
Many thanks.
Then again, maybe it is not, arrays are objects too
That's right. So I think this approach will not buy you anything.
If you want to go down that route, you could flatten this out into a one-dimensional array (each of your "objects" then takes two slots). That would give you immediate access to all fields in all objects, without having to follow pointers, and the whole thing is just one big memory allocation: since your component type is primitive, there is just one object as far as memory allocation is concerned (the container array itself).
This is one of the motivations for people wanting to have structs and value types in Java, and similar considerations drive the development of specialized high-performance data structure libraries (that get rid of unneccessary object wrappers).
I would not worry about it, until you really have a huge datastructure, though. Only then will the overhead of the object-oriented way matter.
I somehow think that the array-based approach should be more compact (memory) and faster (access)
It won't. You can easily confirm this by using Java Management interfaces:
com.sun.management.ThreadMXBean b = (com.sun.management.ThreadMXBean) ManagementFactory.getThreadMXBean();
long selfId = Thread.currentThread().getId();
long memoryBefore = b.getThreadAllocatedBytes(selfId);
// <-- Put measured code here
long memoryAfter = b.getThreadAllocatedBytes(selfId);
System.out.println(memoryAfter - memoryBefore);
Under measured code put new double[0] and new Object() and you will see that those allocations will require exactly the same amount of memory.
It might be that the JVM/JIT treats arrays in a special way which could make them faster to access in one way or another.
JIT do some vectorization of an array operations if for-loops. But it's more about speed of arithmetic operations rather than speed of access. Beside that, can't think about any.
The canonical advice that I've seen in this situation is that premature optimisation is the root of all evil. Following that means that you should stick with the code that is easiest to write / maintain / get past your code quality regime, and then look at optimisation if you have a measurable performance issue.
In your examples the memory consumption is similar because in the object case you have 10,000 references plus two doubles per reference, and in the 2D array case you have 10,000 references (the first dimension) to little arrays containing two doubles each. So both are one base reference plus 10,000 references plus 20,000 doubles.
A more efficient representation would be two arrays, where you'd have two base references plus 20,000 doubles.
double[] a = new double[10000];
double[] b = new double[10000];
I have to store millions of X/Y double pairs for reference in my Java program. I'd like to keep memory consumption as low as possible as well as the number of object references. So after some thinking I decided holding the two points in a tiny double array might be a good idea, it's setup looks like so:
double[] node = new double[2];
node[0] = x;
node[1] = y;
I figured using the array would prevent the link between the class and my X and Y variables used in a class, as follows:
class Node {
public double x, y;
}
However after reading into the way public fields in classes are stored, it dawned on me that fields may not actually be structured as pointer like structures, perhaps the JVM is simply storing these values in contiguous memory and knows how to find them without an address thus making the class representation of my point smaller than the array.
So the question is, which has a smaller memory footprint? And why?
I'm particularly interested in whether or not class fields use a pointer, and thus have a 32-bit overhead, or not.
The latter has the smaller footprint.
Primitive types are stored inline in the containing class. So your Node requires one object header and two 64-bit slots. The array you specify uses one array header (>= an object header) plust two 64-bit slots.
If you're going to allocate 100 variables this way, then it doesn't matter so much, as it is just the header sizes which are different.
Caveat: all of this is somewhat speculative as you did not specify the JVM - some of these details may vary by JVM.
I don't think your biggest problem is going to be storing the data, I think it's going to be retrieving, indexing, and manipulating it.
However, an array, fundamentally, is the way to go. If you want to save on pointers, use a one dimensional array. (Someone has already said that).
First, it must be stated that the actual space usage depends on the JVM you are using. It is strictly implementation specific. The following is for a typical mainstream JVM.
So the question is, which has a smaller memory footprint? And why?
The 2nd version is smaller. An array has the overhead of the 32 bit field in the object header that holds the array's length. In the case of a non-array object, the size is implicit in the class and does not need to be represented separately.
But note that this is a fixed over head per array object. The larger the array is, the less important the overhead is in practical terms. And the flipside of using a class rather than array is that indexing won't work and your code may be more complicated (and slower) as a result.
A Java 2D array is actually and array of 1D arrays (etcetera), so you can apply the same analysis to arrays with higher dimensionality. The larger the size an array has in any dimension, the less impact the overhead has. The overhead in a 2x10 array will be less than in a 10x2 array. (Think it through ... 1 array of length 2 + 2 of length 10 versus 1 array of length 10 + 10 of length 2. The overhead is proportional to the number of arrays.)
I'm particularly interested in whether or not class fields use a pointer, and thus have a 32-bit overhead, or not.
(You are actually talking about instance fields, not class fields. These fields are not static ...)
Fields whose type is a primitive type are stored directly in the heap node of the object without any references. There is no pointer overhead in this case.
However, if the field types were wrapper types (e.g. Double rather than double) then there could be the overhead of a reference AND the overheads of the object header for the Double object.
I need to store a very large amount of instances of my class, and since I have a pretty terrible computer with only 2gb of RAM I need it to run with as little memory usage as possible. So can anyone tell me is it more efficient to have a ton of fields or an array. I don't care about the "best way" to do it, I need the way that uses the least RAM. So yeah, an array or many fields?
Your question is a little unclear, but basically the class
public class SomeClass {
int var1;
int var2;
...
int var100;
}
Will take as much space as an int[100] array. There might be a slight difference, depending on the platform, but no more than 16 bytes total, and it could go either way. (And you can substitute any other data type in place of int and the same thing will be true.)
But, just to be clear, either of the above takes up much less space than 100 objects, each containing one int.
An array doesn't condense the objects in any way, it just orders them. So fields or an array would have the same memory overhead.
That said, having an array of objects (or a List) would be better to keep your objects collected and together.