With the aim of getting a better performance I'm fine tuning the code, looking through the DDMS tracer. One aspect is Array.get(x) which is more expensive than Array.items[x]
We can directly access the items proving the array type is Object, or, we specify the array type in the constructor, like so:
Array<MyClass> foo = new Array<MyClass>(MyClass.class)
This works fine, however, how do I specify the last MyClass.class in a for loop? I have this at the moment:
for (Array<MyClass> listOfObjects : allObjects) {
for (int i=0; i<listOfObjects.size; i++) {
MyClass myObj = listOfObjects.get(i);
//MyClass myObj = listOfObjects.items[i];
The commented line works fine, but trying to get rid of the overhead, I want to supply the `(MyClass.class)' like mentioned above. Where can I do this in that for-loop constructor?
Many thanks
J
I think that what you're trying to do is pointless. Please read this great article: http://blog.codinghorror.com/the-sad-tragedy-of-micro-optimization-theater/
You are trying to generate some minimal optimization, while at the same time greatly reducing readibility and maintainability.
If you want less overhead, it would probably be wiser to look at a language like C++, rather than trying to hack basic java for loops.
Another thing you may want to look into is Java 8, which has added functionality for executing loops concurrently with Streams.
Array<MyClass> foo = new Array<MyClass>(MyClass.class)
Note that you are creating a NEW array with this line, passing it a class argument. From http://libgdx.badlogicgames.com/nightlies/docs/api/com/badlogic/gdx/utils/Array.html
Array(java.lang.Class arrayType)
Creates an ordered array with items of the specified type and a capacity of 16.
I don't see you trying to create new Arrays in the other code you posted. Are you trying to populate each listOfObjects in allObjects?
If so, you would want to do something like:
for (int i = 0; i < allObjects.size; i++)
{
allObjects.items[i] = new Array<MyClass>(MyClass.class);
}
If you are simply trying to loop through these arrays, there is no class argument needed. I would suggest comparing the Array class to other Gdx or Java collections if the speed of iteration is too slow.
This quote from above link may also be notable if you do a lot of removing from the arrays.
A resizable, ordered or unordered array of objects. If unordered, this class avoids a memory copy when removing elements (the last element is moved to the removed element's position).
Related
Is there a fundamental difference in Java between an ArrayList, and a class that uses regular arrays to store items, has an index to keep track of the number of items in the list, and automatically increases the size of the array when it runs out of space?
class myArrayList {
private int[] array = new int[10];
private int itemsInArray = 0;
private void increaseArraySize() {
int[] newArray = new int[array.length + 10];
System.arraycopy(array, 0, newArray, 0, array.length);
array = newArray;
}
public void put(int i) {
if (itemsInArray == array.length)
{
increaseArraySize();
}
array[itemsInArray] = i;
itemsInArray++;
}
public int get(int idx) {
return array[idx];
}
public int size() {
return itemsInArray;
}
}
An ArrayList has some additional methods my class doesn't have (that I could add), and implements the List interface, but other than that, is ArrayList just for convenience? Do both use the heap to store data?
Is there a fundamental difference in Java between an ArrayList, and a class that uses regular arrays to store items, has an index to keep track of the number of items in the list, and automatically increases the size of the array when it runs out of space?
In general no. Under the hood, ArrayList is just an ordinary pure Java class i.e. no native code. It is (roughly speaking!) doing what your code does.
But (as the comments say) it already exists. You don't need to design it, code it, debug it, tune it ... You just use it! Also read Basil's answer!
However, I would note that your version is different from ArrayList in some (other) important respects:
A myArrayList holds only int values. It is not generic.
An ArrayList holds objects. If you needed a list of integers you would need to use Integer as the type parameter rather than int. (Because that's the way that Java generic classes work.)
In myArrayList, a set call beyond the end of the list will grow the list. It is behaving more like a dynamic array than a list.
In ArrayList, a set call beyond the end of the list will throw an exception.
If you want a Java "list" type that is specialized for int or some other primitive type, there are existing 3rd party libraries; e.g. the GNU Trove library.
Do both use the heap to store data?
Yes. In fact, if you look at the source code of ArrayList you will see that it does something like what you code is doing. But it is doing it "smarter" and this will result in better "big O" performance in certain operations.
Consider this:
myArrayList list = new myArrayList();
for (int i = 0; i < N; i++) {
list.put(i, 1);
}
The computational complexity of this is O(N2).
Each call to list.put(i, 1) will cause a resize, creating a new array of size i and will then copy i - 1 values to the new array. That adds up to 0 + 1 + ... N - 1 or N * (N - 1) / 2 copies. That is O(N2) for the N calls.
By contrast, ArrayList uses a resize strategy of growing the list by 50% of its current size. If you do the analysis, it turns out that the average amortized cost for N calls to ArrayList.append is O(N) ... not O(N2).
Lesson #1: Don't go trying to re-implement standard Java utility classes. It is usually a waste of time and there is a good chance that your efforts will actually make things worse!
There are exceptions to this lesson, but you need a lot of Java programming experience (and / or use of profiling tools) to identify them. Even then, there is a good chance that there is an existing a 3rd-party alternative that addresses the problem.
Lesson #2: If your goal is to understand how the standard utility classes work under the hood, the best way is to read the OpenJDK source code. It is good code and well commented. In cases where it is complicated there is a good reason for that. But any experienced Java programmer should be capable of understanding it if they work hard at it.
You asked:
What's the difference between using ArrayList, or dynamically growing an array in Java?
Looking at the Collections Framework Overview, the very first bullet item says:
The primary advantages of a collections framework are that it:
• Reduces programming effort by providing data structures and algorithms so you don't have to write them yourself.
You asked:
Is there a fundamental difference in Java between an ArrayList, and a class that uses regular arrays
In terms of behavior, no fundamental difference. As the name implies, the current implementation of ArrayList is a class that uses regular arrays. So there is no point to you writing your own.
Keep in mind that future versions of ArrayList implementations are free to use some other approach besides actual arrays provided the contract promised in the Javadoc is met.
is ArrayList just for convenience?
Yes, as stated above. Rather than have every individual programmer write their own implementation, why not share one single well-written, well-debugged, and well-documented implementation?
Most implementations of Java are based on the OpenJDK open-source codebase. You are free to peruse the source code.
I want to know if there is a difference in performance if I use a primitive array and then rebuild it to add new elements like this:
AnyClass[] elements = new AnyClass[0];
public void addElement(AnyClass e) {
AnyClass[] temp = new AnyClass[elements.length + 1];
for (int i = 0; i < elements.length; i++) {
temp[i] = elements[i];
}
temp[elements.length] = e;
elements = temp;
}
or if I just use an ArrayList and add the elements.
I am not certain that is why I ask, is it the same speed because an ArrayList is build in the same way as I did it with the primitive array or is there really a difference and a primitive array is always faster even if I rebuild it everytime I add an element?
ArrayLists work in a similar way but instead of rebuilding every time they double there capacity every time the limit is reached. so if you are constantly adding to it ArrayLists will be faster because recreating the array is fairly slow.
So your implementation could use less memory if you are not adding to it often but as far as speed goes it will be slower most of the time.
In a nutshell, stick with ArrayList. It is:
widely understood;
well tested;
will probably be more performant that your own implementation (for example, ArrayList.add() is guaranteed to be amortised constant-time, which your method is not).
When an ArrayList resizes it doubles itself, so that you are not wasting time resizing each time. Amortized, that means that it doesn't take any time to resize. That's why you shouldn't waste time recreating the wheel. The people who created the first one already learned how to make one more efficient and know more about the platform than you do.
There is no performance issue in both Arrays and ArrayList.
Arrays and ArrayList are index based so both will work in same way.
If you required the dynamic Array you can use arrayList.
If Array size is static then go with Array.
Your implementation is likely to lose clearly to Java's ArrayList in terms of speed. One particularly expensive thing you're doing is reallocating the array every time you want to add elements, while Java's ArrayList tries to optimize by having some "buffer" before having to reallocate.
ArrayList will also use internally Array Only , so this is true Array will be faster than ArrayList. While writing high performance code always use Array. For the same reason Array is back bone for most of the collections.
You must go through JDK implementation of Collections.
We use ArrayList when we are developing some application and we are not concerned about such minor performance issues and we do trade off because we get already written API to put , get , resize etc.
Context is very important: I mean if you are constantly inserting new items/elements ArrayList will certainly be faster than Array. On the other hand if you just want to access an element at a known position-say arrayItems[8], Array is faster than ArrayList.get(8); Sine there is overhead of get() function calls and other steps and checks.
I am working on refactoring a small portion of an open source large-scale configuration management system for my University.
We're using some open source tools for machine learning like Weka, and the aspect I am assigned to refactor is dealing with data mining and constructing rules.
The open source files we've been using from Liverpool and Japan are working well, but there are some memory usage issues when we use the program on large scale projects.
I've isolated the major memory hogs and come to the conclusion I need to figure out a different data structure to store and manipulate the data. As it stands now, the program is using what end up becoming very large multidimensional arrays of integers, objects, strings, etc.
There are several methods that simply reconfigure the set up of the associations after we are deriving rules for behaviors. In many cases, we are only adding or subtracting a single element, or simply flattening the multidimensional arrays.
I primarily program in C/C++ in general, so I am not an expert on the data structures available in Java. What I am looking to replace the static arrays with is a dynamic structure that can be easily resized without having to create a second multidimensional array.
What is happening now is we are having to create an entirely new structure every time we add and remove rules, objects, or other miscellaneous data from the multidimensional array. Then we are immediately copying into the new array.
I'd like to be able to simply use the same multidimensional array and simply add a new row and column. Subsequently, I'd like to be able to manipulate the data in the structure by simply saving a temporary value and overwriting previous values, shifting left, right, etc.
Can anyone think of any data structures in Java that would fit the bill?
On a related note, I have looked into explicit garbage collection, but have found I can only really suggest the JVM collect by calling System.Gc(), or by manipulating the garbage collection behavior of the JVM by way of tuning. Is there a better or more effective way?
Regards,
Edm
If you have a lot of nulls/zeroes/falses/empty-strings in your matrix, then you can save space by using a sparse matrix implementation. Matrix-toolkits has several sparse matrices that you can use / modify to suit your needs, or you can just use a hashmap with an {x, y} tuple as the key. (The hashmap also has the advantage that there are several external hashmap implementations available, e.g. BerkeleyDB, so that it's unlikely that you'll run out of memory.)
To replace static arrays with a dynamic structure use an ArrayList that grows with data automatically. To have a two-dimensional data structure use a List of List as
List<List<Integer>> dataStore = new ArrayList<List<Integer>>();
dataStore.add(new ArrayList<Integer>());
dataStore.add(Arrays.asList(1, 2, 3, 4));
// Access [1][3] as
System.out.println(dataStore.get(1).get(3)); // prints 4
Since, you touched upon having control over garbage collection (which Java actually does a pretty good job of all by itself) it seems memory management is of paramount importance as this is what's causing the re-factoring in the first place.
You could look into the Flyweight GoF pattern that focuses on sharing of objects instead of repeating them to cut down on the memory footprint of the application. To enable sharing flyweight objects need to be made immutable.
Psuedo code:
// adding a new flyweight obj at [2][1]
fwObjStore.get(2).set(1, FWObjFactory.getInstance(fwKey));
public class FWObjFactory {
private static Map<String, FWObject> fwMap = new HashMap<String, FWObject>();
public static getInstance(String fwKey) {
if (!fwMap.containsKey(fwKey)) {
fwMap.put(fwKey, newFwFromKey(fwKey));
}
return fwMap.get(fwKey);
}
private static FWObject newFwFromKey(String fwKey) {
// ...
}
}
I would look into using a "List of Lists". For example, you could declare something like
List<List<Object>> mArray = new ArrayList<List<Object>>();
Any time you need to add a new "row", you could do something like:
mArray.add (new ArrayList<Object>());
Check out the List interface to see what you can do with Lists in Java and which classes implement the interface (or roll your own!).
There's no multidimentional thing in Java.Java has array of arrays.
You can use ArrayList with type parameter as ArrayList
ArrayList<ArrayList<yourType>> myList = new ArrayList<ArrayList<yourType>>();
Also,don't worry about GC..It would collect as and when required..
Why not use two Lists tangled together? Like so:
List<List<String>> rowColumns = new ArrayList<>();
// Add a row with two entries, or columns:
List<String> oneRow = Arrays.asList("Hello", "World!");
rowColumns.add(oneRow);
Also, consider using a Map with entries mapped to Lists.
Garbage Collection should generally never have to be dealt with explicitly in Java. Usually you want to look for memory leaks whenever one occur first. When that happens, look for background threads that don't die as supposed to or strong references in caches. If you want to read some about the latter issue, you can start here and here.
In C++ i can insert an item into an arbitrary position in a vector, just like the code below:
std::vector<int> vec(10);
vec.insert(vec.begin()+2,2);
vec.insert(vec.begin()+4,3);
In Java i can not do the same, i get an exception java.lang.ArrayIndexOutOfBoundsException, code below:
Vector l5 = new Vector(10);
l5.add(0, 1);
l5.add(1, "Test");
l5.add(3, "test");
It means that C++ is better designed or is just a Java design decision ?
Why java use this approach ?
In the C++ code:
std::vector<int> vec(10);
You are creating a vector of size 10. So all indexes from 0 to 9 are valid afterwards.
In the Java code:
Vector l5 = new Vector(10);
You are creating an empty vector with an initial capacity of 10. It means the underlying array is of size 10 but the vector itself has the size 0.
It does not mean one is better designed than the other. The API is just different and this is not a difference that makes one better than the other.
Note that in Java it is now preffered to use ArrayList, which has almost the same API, but is not synchronized. If you want to find a bad design decision in Java's Vector, then this synchronization on every operation was probably one.
Therefore the best way to write an equivalent of the C++ initialization code in Java is :
List<Integer> list = new ArrayList<Integer>();
for (int i = 0; i < 10; i++){
list.add(new Integer());
}
The Javadoc for Vector.add(int, Object) pretty clearly states that an IndexOutOfBoundsException will be thrown if the index is less than zero or greater than or equal to the size. The Vector type grows as needed, and the constructor you've used sets the initial capacity, which is different than the size. Please read the linked Javadoc to better understand how to use the Vector type. Also, we java developers typically use a List type, such as ArrayList in situations where you would generally use a std::vector in C++.
Differences? You cannot compare how 2 languages do those. Normally Vector do use Stack data structure or LinkedList (or may be both). Which means, you put one item to the top, put another item on top of it, another item even on top of it, like wise. In LinkedList, it is bit different, you "pull" the value but the same thing. So in C++ it is better to use push_back() method.
C++ Vector objects are instantiated automatically. But in Java it is not, you have to fill it. I disagree with the way of filling it using l5.add(1, "Test");. Use l5.add("test").
Since you asked differences, you can define your object in this way as well
Vector a = new Vector();
That is without a type, in Java we call it without Generics. Possible since Java 1.6
Vector is now not widely used in Java. It has delays. We now move with ArrayList which is inside List interface.
Edit
variable names such as l5 are widely used in C++. But Java community expects more meaningful variable names :)
I have a java list
List<myclass> myList = myClass.selectFromDB("where clause");
//myClass.selectFromDB returns a list of objects from DB
But I want a different list, specifically.
List<Integer> goodList = new ArrayList<Integer>();
for(int i = 0;i++; i<= myList.size()) {
goodList[i] = myList[i].getGoodInteger();
}
Yes, I could do a different query from the DB in the initial myList creation, but assume for now I must use that as the starting point and no other DB queries. Can I replace the for loop with something much more efficient?
Thank you very much for any input, apologies for my ignorance.
In order to extract a field from the "myclass", you're going to have to loop through the entire contents of the list. Whether you do that with a for loop, or use some sort of construct that hides the for loop from you, it's still going to take approximately the same time and use the same amount of resources.
An important question is: why do you want to do this? Are you trying to make your code cleaner? If so, you could write a method along these lines:
public static List<Integer> extractGoodInts (List<myClass> myList) {
List<Integer> goodInts = new ArrayList<Integer>();
for(int i = 0; i < myList.size(); i++){
goodInts.add(myList.get(i).getGoodInteger());
}
return goodInts;
}
Then, in your code, you can just go:
List<myClass> myList = myClass.selectFromDB("where clause");
List<Integer> goodInts = myClass.extractGoodInts(myList);
However, if you're trying to make your code more efficient and you're not allowed to change the query, you're out of luck; somehow or another, you're going to need to individually grab each int from the list, which means you're going to be running in O(n) time no matter what clever tricks you can come up with.
There are really only two ways I can think of that you can make this more "efficient":
Somehow split this up between multiple cores so you can do the work in parallel. Of course, this assumes that you've got other cores, they aren't doing anything useful already, and that there's enough processing going on that the overheard of doing this is even worth it. My guess is that (at least) the last point isn't true in your case given that you're just calling a getter. If you wanted to do this you'd try to have a number of threads (I'd probably actually use an Executor and Futures for this) equal to the number of cores, and then give roughly equal amounts of work to each of them (probably just by slicing your list into roughly equal sized pieces).
If you believe that you'll only be accessing a small subset of the resulting List, but are unsure of exactly which elements, you could try doing things lazily. The easiest way to do that would be to use a pre-built lazy mapping List implementation. There's one in Google Collections Library. You use it by calling Lists.transform(). It'll immediately return a List, but it'll only perform your transformation on elements as they are requested. Again, this is only more efficient if it turns out that you only ever look at a small fraction of the output List. If you end up looking at the entire thing this will not be more efficient, and will probably work out to be less efficient.
Not sure what you mean by efficient. As the others said, you have to call the getGoodInteger method on every element of that list one way or another. About the best you can do is avoid checking the size every time:
List<Integer> goodInts = new ArrayList<Integer>();
for (MyClass myObj : myList) {
goodInts.add(myObj.getGoodInteger());
}
I also second jboxer's suggestion of making a function for this purpose.