In C++ i can insert an item into an arbitrary position in a vector, just like the code below:
std::vector<int> vec(10);
vec.insert(vec.begin()+2,2);
vec.insert(vec.begin()+4,3);
In Java i can not do the same, i get an exception java.lang.ArrayIndexOutOfBoundsException, code below:
Vector l5 = new Vector(10);
l5.add(0, 1);
l5.add(1, "Test");
l5.add(3, "test");
It means that C++ is better designed or is just a Java design decision ?
Why java use this approach ?
In the C++ code:
std::vector<int> vec(10);
You are creating a vector of size 10. So all indexes from 0 to 9 are valid afterwards.
In the Java code:
Vector l5 = new Vector(10);
You are creating an empty vector with an initial capacity of 10. It means the underlying array is of size 10 but the vector itself has the size 0.
It does not mean one is better designed than the other. The API is just different and this is not a difference that makes one better than the other.
Note that in Java it is now preffered to use ArrayList, which has almost the same API, but is not synchronized. If you want to find a bad design decision in Java's Vector, then this synchronization on every operation was probably one.
Therefore the best way to write an equivalent of the C++ initialization code in Java is :
List<Integer> list = new ArrayList<Integer>();
for (int i = 0; i < 10; i++){
list.add(new Integer());
}
The Javadoc for Vector.add(int, Object) pretty clearly states that an IndexOutOfBoundsException will be thrown if the index is less than zero or greater than or equal to the size. The Vector type grows as needed, and the constructor you've used sets the initial capacity, which is different than the size. Please read the linked Javadoc to better understand how to use the Vector type. Also, we java developers typically use a List type, such as ArrayList in situations where you would generally use a std::vector in C++.
Differences? You cannot compare how 2 languages do those. Normally Vector do use Stack data structure or LinkedList (or may be both). Which means, you put one item to the top, put another item on top of it, another item even on top of it, like wise. In LinkedList, it is bit different, you "pull" the value but the same thing. So in C++ it is better to use push_back() method.
C++ Vector objects are instantiated automatically. But in Java it is not, you have to fill it. I disagree with the way of filling it using l5.add(1, "Test");. Use l5.add("test").
Since you asked differences, you can define your object in this way as well
Vector a = new Vector();
That is without a type, in Java we call it without Generics. Possible since Java 1.6
Vector is now not widely used in Java. It has delays. We now move with ArrayList which is inside List interface.
Edit
variable names such as l5 are widely used in C++. But Java community expects more meaningful variable names :)
Related
Is there a fundamental difference in Java between an ArrayList, and a class that uses regular arrays to store items, has an index to keep track of the number of items in the list, and automatically increases the size of the array when it runs out of space?
class myArrayList {
private int[] array = new int[10];
private int itemsInArray = 0;
private void increaseArraySize() {
int[] newArray = new int[array.length + 10];
System.arraycopy(array, 0, newArray, 0, array.length);
array = newArray;
}
public void put(int i) {
if (itemsInArray == array.length)
{
increaseArraySize();
}
array[itemsInArray] = i;
itemsInArray++;
}
public int get(int idx) {
return array[idx];
}
public int size() {
return itemsInArray;
}
}
An ArrayList has some additional methods my class doesn't have (that I could add), and implements the List interface, but other than that, is ArrayList just for convenience? Do both use the heap to store data?
Is there a fundamental difference in Java between an ArrayList, and a class that uses regular arrays to store items, has an index to keep track of the number of items in the list, and automatically increases the size of the array when it runs out of space?
In general no. Under the hood, ArrayList is just an ordinary pure Java class i.e. no native code. It is (roughly speaking!) doing what your code does.
But (as the comments say) it already exists. You don't need to design it, code it, debug it, tune it ... You just use it! Also read Basil's answer!
However, I would note that your version is different from ArrayList in some (other) important respects:
A myArrayList holds only int values. It is not generic.
An ArrayList holds objects. If you needed a list of integers you would need to use Integer as the type parameter rather than int. (Because that's the way that Java generic classes work.)
In myArrayList, a set call beyond the end of the list will grow the list. It is behaving more like a dynamic array than a list.
In ArrayList, a set call beyond the end of the list will throw an exception.
If you want a Java "list" type that is specialized for int or some other primitive type, there are existing 3rd party libraries; e.g. the GNU Trove library.
Do both use the heap to store data?
Yes. In fact, if you look at the source code of ArrayList you will see that it does something like what you code is doing. But it is doing it "smarter" and this will result in better "big O" performance in certain operations.
Consider this:
myArrayList list = new myArrayList();
for (int i = 0; i < N; i++) {
list.put(i, 1);
}
The computational complexity of this is O(N2).
Each call to list.put(i, 1) will cause a resize, creating a new array of size i and will then copy i - 1 values to the new array. That adds up to 0 + 1 + ... N - 1 or N * (N - 1) / 2 copies. That is O(N2) for the N calls.
By contrast, ArrayList uses a resize strategy of growing the list by 50% of its current size. If you do the analysis, it turns out that the average amortized cost for N calls to ArrayList.append is O(N) ... not O(N2).
Lesson #1: Don't go trying to re-implement standard Java utility classes. It is usually a waste of time and there is a good chance that your efforts will actually make things worse!
There are exceptions to this lesson, but you need a lot of Java programming experience (and / or use of profiling tools) to identify them. Even then, there is a good chance that there is an existing a 3rd-party alternative that addresses the problem.
Lesson #2: If your goal is to understand how the standard utility classes work under the hood, the best way is to read the OpenJDK source code. It is good code and well commented. In cases where it is complicated there is a good reason for that. But any experienced Java programmer should be capable of understanding it if they work hard at it.
You asked:
What's the difference between using ArrayList, or dynamically growing an array in Java?
Looking at the Collections Framework Overview, the very first bullet item says:
The primary advantages of a collections framework are that it:
• Reduces programming effort by providing data structures and algorithms so you don't have to write them yourself.
You asked:
Is there a fundamental difference in Java between an ArrayList, and a class that uses regular arrays
In terms of behavior, no fundamental difference. As the name implies, the current implementation of ArrayList is a class that uses regular arrays. So there is no point to you writing your own.
Keep in mind that future versions of ArrayList implementations are free to use some other approach besides actual arrays provided the contract promised in the Javadoc is met.
is ArrayList just for convenience?
Yes, as stated above. Rather than have every individual programmer write their own implementation, why not share one single well-written, well-debugged, and well-documented implementation?
Most implementations of Java are based on the OpenJDK open-source codebase. You are free to peruse the source code.
Array DS requires all its members to have the same time. Java throws ArrayStoreException when an attempt has been made to store the wrong type of object into an array of objects. Don't remember what C++ does.
Is my understanding correct that it's important to have all objects of the same type in array because it guarantees constant time element access through the following two operations:
1) element size * element index = offset
2) array pointer address + offset
If objects are of different types and consequently different size, the above mentioned formula won't work.
Because: we want it like this.
What I mean is: people using the Java language (probably the same for C++) are using a statically typed language for a purpose.
And when such people starting thinking in plurals; they typically think in plurals of "similar" things.
Caveat: in Java, everything is an Object, so you can always declare an Object[] and stuff anything into that. Strings, Numbers, whatever.
And that also leads to the other important aspect: in C++, your array represents an area in memory. And you better have same sized elements in that area; to avoid data corruption.
In Java on the other hand, an array is not pointing to raw memory.
Long story short: there are real differences between Java and C++ in this context (that one has to understand to make an informed decision); and then there is the "language" thing itself. In other words: this is not Ruby land, where you just put ducks, numbers, plants and quack sounds in the same "list" without further thinking.
Final thought, based on that joke in the last paragraph: in my eyes, an array is an implementation of the list concepts, thus it is about a collection of things of the same nature. If you want a collection of unrelated things, I would rather call that a tuple.
Yes, you are right. All that is required for constant time random access.
Also, you can have an array of void pointers if you want to store different data types in a single array. For instance in c++, do
void * a[N]
a[i] = (void *)(&YourClass)
Similarly, use Object[] in java.
The C++ language (and compiler) requires the type of the elements to be stored in an array for various reasons, like pointer arithmetics and array subscription (e.g. x[i]), default initialisation of the elements, dealing with alignment restrictions, ...).
int x[3] = { 1,2,3 }; // array of 3 int values, each being properly aligned concerning processor architecture;
myObjectType objs[10]; // array of 10 objects of type myObjectType, each being default initialised (probably the default constructor), each being properly aligned
myObjectType *objs[10]; // array of 10 pointers to objects of type myObjectType (including subclasses of myObjectType; allowing dynamic binding and polymorphism). Note: all pointers have the same size, the object to which they point may differ insize.
int *intptr = x;
bool isEqual= (intptr[2] == x[2]); // gives true
intptr += 2; // increases the pointer by "2*sizeof(int)" bytes.
So, yes, you are right: one reason is because of calculating offsets; But there are other reasons as well, and other issues like alignment, array to pointer decay, default initialisation logic are probably more subtle but essential, too.
With the aim of getting a better performance I'm fine tuning the code, looking through the DDMS tracer. One aspect is Array.get(x) which is more expensive than Array.items[x]
We can directly access the items proving the array type is Object, or, we specify the array type in the constructor, like so:
Array<MyClass> foo = new Array<MyClass>(MyClass.class)
This works fine, however, how do I specify the last MyClass.class in a for loop? I have this at the moment:
for (Array<MyClass> listOfObjects : allObjects) {
for (int i=0; i<listOfObjects.size; i++) {
MyClass myObj = listOfObjects.get(i);
//MyClass myObj = listOfObjects.items[i];
The commented line works fine, but trying to get rid of the overhead, I want to supply the `(MyClass.class)' like mentioned above. Where can I do this in that for-loop constructor?
Many thanks
J
I think that what you're trying to do is pointless. Please read this great article: http://blog.codinghorror.com/the-sad-tragedy-of-micro-optimization-theater/
You are trying to generate some minimal optimization, while at the same time greatly reducing readibility and maintainability.
If you want less overhead, it would probably be wiser to look at a language like C++, rather than trying to hack basic java for loops.
Another thing you may want to look into is Java 8, which has added functionality for executing loops concurrently with Streams.
Array<MyClass> foo = new Array<MyClass>(MyClass.class)
Note that you are creating a NEW array with this line, passing it a class argument. From http://libgdx.badlogicgames.com/nightlies/docs/api/com/badlogic/gdx/utils/Array.html
Array(java.lang.Class arrayType)
Creates an ordered array with items of the specified type and a capacity of 16.
I don't see you trying to create new Arrays in the other code you posted. Are you trying to populate each listOfObjects in allObjects?
If so, you would want to do something like:
for (int i = 0; i < allObjects.size; i++)
{
allObjects.items[i] = new Array<MyClass>(MyClass.class);
}
If you are simply trying to loop through these arrays, there is no class argument needed. I would suggest comparing the Array class to other Gdx or Java collections if the speed of iteration is too slow.
This quote from above link may also be notable if you do a lot of removing from the arrays.
A resizable, ordered or unordered array of objects. If unordered, this class avoids a memory copy when removing elements (the last element is moved to the removed element's position).
I am working on refactoring a small portion of an open source large-scale configuration management system for my University.
We're using some open source tools for machine learning like Weka, and the aspect I am assigned to refactor is dealing with data mining and constructing rules.
The open source files we've been using from Liverpool and Japan are working well, but there are some memory usage issues when we use the program on large scale projects.
I've isolated the major memory hogs and come to the conclusion I need to figure out a different data structure to store and manipulate the data. As it stands now, the program is using what end up becoming very large multidimensional arrays of integers, objects, strings, etc.
There are several methods that simply reconfigure the set up of the associations after we are deriving rules for behaviors. In many cases, we are only adding or subtracting a single element, or simply flattening the multidimensional arrays.
I primarily program in C/C++ in general, so I am not an expert on the data structures available in Java. What I am looking to replace the static arrays with is a dynamic structure that can be easily resized without having to create a second multidimensional array.
What is happening now is we are having to create an entirely new structure every time we add and remove rules, objects, or other miscellaneous data from the multidimensional array. Then we are immediately copying into the new array.
I'd like to be able to simply use the same multidimensional array and simply add a new row and column. Subsequently, I'd like to be able to manipulate the data in the structure by simply saving a temporary value and overwriting previous values, shifting left, right, etc.
Can anyone think of any data structures in Java that would fit the bill?
On a related note, I have looked into explicit garbage collection, but have found I can only really suggest the JVM collect by calling System.Gc(), or by manipulating the garbage collection behavior of the JVM by way of tuning. Is there a better or more effective way?
Regards,
Edm
If you have a lot of nulls/zeroes/falses/empty-strings in your matrix, then you can save space by using a sparse matrix implementation. Matrix-toolkits has several sparse matrices that you can use / modify to suit your needs, or you can just use a hashmap with an {x, y} tuple as the key. (The hashmap also has the advantage that there are several external hashmap implementations available, e.g. BerkeleyDB, so that it's unlikely that you'll run out of memory.)
To replace static arrays with a dynamic structure use an ArrayList that grows with data automatically. To have a two-dimensional data structure use a List of List as
List<List<Integer>> dataStore = new ArrayList<List<Integer>>();
dataStore.add(new ArrayList<Integer>());
dataStore.add(Arrays.asList(1, 2, 3, 4));
// Access [1][3] as
System.out.println(dataStore.get(1).get(3)); // prints 4
Since, you touched upon having control over garbage collection (which Java actually does a pretty good job of all by itself) it seems memory management is of paramount importance as this is what's causing the re-factoring in the first place.
You could look into the Flyweight GoF pattern that focuses on sharing of objects instead of repeating them to cut down on the memory footprint of the application. To enable sharing flyweight objects need to be made immutable.
Psuedo code:
// adding a new flyweight obj at [2][1]
fwObjStore.get(2).set(1, FWObjFactory.getInstance(fwKey));
public class FWObjFactory {
private static Map<String, FWObject> fwMap = new HashMap<String, FWObject>();
public static getInstance(String fwKey) {
if (!fwMap.containsKey(fwKey)) {
fwMap.put(fwKey, newFwFromKey(fwKey));
}
return fwMap.get(fwKey);
}
private static FWObject newFwFromKey(String fwKey) {
// ...
}
}
I would look into using a "List of Lists". For example, you could declare something like
List<List<Object>> mArray = new ArrayList<List<Object>>();
Any time you need to add a new "row", you could do something like:
mArray.add (new ArrayList<Object>());
Check out the List interface to see what you can do with Lists in Java and which classes implement the interface (or roll your own!).
There's no multidimentional thing in Java.Java has array of arrays.
You can use ArrayList with type parameter as ArrayList
ArrayList<ArrayList<yourType>> myList = new ArrayList<ArrayList<yourType>>();
Also,don't worry about GC..It would collect as and when required..
Why not use two Lists tangled together? Like so:
List<List<String>> rowColumns = new ArrayList<>();
// Add a row with two entries, or columns:
List<String> oneRow = Arrays.asList("Hello", "World!");
rowColumns.add(oneRow);
Also, consider using a Map with entries mapped to Lists.
Garbage Collection should generally never have to be dealt with explicitly in Java. Usually you want to look for memory leaks whenever one occur first. When that happens, look for background threads that don't die as supposed to or strong references in caches. If you want to read some about the latter issue, you can start here and here.
In this answer to a question I asked. Kathy Van Stone says that adding an array like so
jList1.setListData(LinkedHashMap.keySet().toArray());
is a better way than to do it like this
jList1.setListData(new Vector<String>(LinkedHashMap.keySet()));
I am wondering if there was any truth to this and if so what the reason behind it was.
Do you need the synchronization afforded by the Vector class? If not, then you don't want to use the Vector class. (If only JList allowed using an ArrayList in its place.) The cost of uncontended synchronization is way lower in recent JVM versions, way lower than it used to be. Still, there's no reason to use unnecessary synchronization.
It looks like Kathy Van Stone may be referring to that extra synchronization in Vector.
Note carefully the JavaDoc for public JList(Vector<?> listData):
The created model references the given Vector directly. Attempts to modify the
Vector after constructing the list results in undefined behavior.
Unfortunately, the JavaDoc for public JList(Object[] listData) has the same warning:
The created model references the given array directly. Attempts to modify the
array after constructing the list results in undefined behavior.
You have to decide if there is any likelihood that someone decides that the Vector is useful later in the same method and thus modifies the code like this:
Vector vec = new Vector<String>(LinkedHashMap.keySet());
jList1.setListData(vec);
// Other stuff with Vector
vec.add( .... ); // Adding some data type that fits in the Vector
... or of course the same change with the Array constructor version.
Take a look at implementation of both setListData methods in JList class and you will see that it really doesn't matter.
I would prefer the first one, simply because there is no need to involve yet another collection (Vector)